|
1 | 1 | --- |
| 2 | +- &qwen3vl |
| 3 | + url: "github:mudler/LocalAI/gallery/qwen3.yaml@master" |
| 4 | + icon: https://cdn-avatars.huggingface.co/v1/production/uploads/620760a26e3b7210c2ff1943/-s1gyJfvbE1RgO5iBeNOi.png |
| 5 | + license: apache-2.0 |
| 6 | + tags: |
| 7 | + - llm |
| 8 | + - gguf |
| 9 | + - gpu |
| 10 | + - image-to-text |
| 11 | + - multimodal |
| 12 | + - cpu |
| 13 | + - qwen |
| 14 | + - qwen3 |
| 15 | + - thinking |
| 16 | + - reasoning |
| 17 | + name: "qwen3-vl-30b-a3b-instruct" |
| 18 | + urls: |
| 19 | + - https://huggingface.co/unsloth/Qwen3-VL-30B-A3B-Instruct-GGUF |
| 20 | + description: | |
| 21 | + Meet Qwen3-VL — the most powerful vision-language model in the Qwen series to date. |
| 22 | + |
| 23 | + This generation delivers comprehensive upgrades across the board: superior text understanding & generation, deeper visual perception & reasoning, extended context length, enhanced spatial and video dynamics comprehension, and stronger agent interaction capabilities. |
| 24 | + |
| 25 | + Available in Dense and MoE architectures that scale from edge to cloud, with Instruct and reasoning‑enhanced Thinking editions for flexible, on-demand deployment. |
| 26 | + |
| 27 | + #### Key Enhancements: |
| 28 | + |
| 29 | + * **Visual Agent**: Operates PC/mobile GUIs—recognizes elements, understands functions, invokes tools, completes tasks. |
| 30 | + |
| 31 | + * **Visual Coding Boost**: Generates Draw.io/HTML/CSS/JS from images/videos. |
| 32 | + |
| 33 | + * **Advanced Spatial Perception**: Judges object positions, viewpoints, and occlusions; provides stronger 2D grounding and enables 3D grounding for spatial reasoning and embodied AI. |
| 34 | + |
| 35 | + * **Long Context & Video Understanding**: Native 256K context, expandable to 1M; handles books and hours-long video with full recall and second-level indexing. |
| 36 | + |
| 37 | + * **Enhanced Multimodal Reasoning**: Excels in STEM/Math—causal analysis and logical, evidence-based answers. |
| 38 | + |
| 39 | + * **Upgraded Visual Recognition**: Broader, higher-quality pretraining is able to “recognize everything”—celebrities, anime, products, landmarks, flora/fauna, etc. |
| 40 | + |
| 41 | + * **Expanded OCR**: Supports 32 languages (up from 19); robust in low light, blur, and tilt; better with rare/ancient characters and jargon; improved long-document structure parsing. |
| 42 | + |
| 43 | + * **Text Understanding on par with pure LLMs**: Seamless text–vision fusion for lossless, unified comprehension. |
| 44 | + |
| 45 | + #### Model Architecture Updates: |
| 46 | + |
| 47 | + 1. **Interleaved-MRoPE**: Full‑frequency allocation over time, width, and height via robust positional embeddings, enhancing long‑horizon video reasoning. |
| 48 | + |
| 49 | + 2. **DeepStack**: Fuses multi‑level ViT features to capture fine-grained details and sharpen image–text alignment. |
| 50 | + |
| 51 | + 3. **Text–Timestamp Alignment:** Moves beyond T‑RoPE to precise, timestamp‑grounded event localization for stronger video temporal modeling. |
| 52 | + |
| 53 | + This is the weight repository for Qwen3-VL-30B-A3B-Instruct. |
| 54 | + overrides: |
| 55 | + mmproj: mmproj/mmproj-F16.gguf |
| 56 | + parameters: |
| 57 | + model: Qwen3-VL-30B-A3B-Instruct-Q4_K_M.gguf |
| 58 | + files: |
| 59 | + - filename: Qwen3-VL-30B-A3B-Instruct-Q4_K_M.gguf |
| 60 | + sha256: 75d8f4904016d90b71509c8576ebd047a0606cc5aa788eada29d4bedf9b761a6 |
| 61 | + uri: huggingface://unsloth/Qwen3-VL-30B-A3B-Instruct-GGUF/Qwen3-VL-30B-A3B-Instruct-Q4_K_M.gguf |
| 62 | + - filename: mmproj/mmproj-F16.gguf |
| 63 | + sha256: 7e7cec67a3a887bddbf38099738d08570e85f08dd126578fa00a7acf4dacef01 |
| 64 | + uri: huggingface://unsloth/Qwen3-VL-30B-A3B-Instruct-GGUF/mmproj-F16.gguf |
2 | 65 | - &jamba |
3 | 66 | icon: https://cdn-avatars.huggingface.co/v1/production/uploads/65e60c0ed5313c06372446ff/QwehUHgP2HtVAMW5MzJ2j.png |
4 | 67 | name: "ai21labs_ai21-jamba-reasoning-3b" |
|
0 commit comments