TL;DR: SVG code as a Visual Representation
See our demo video for fun!
VCode_demo_video.mp4
- [2025.11.08] ๐ Added Gemini-3-Pro to our benchmark, showing excellent performance.
- [2025.11.08] ๐ฅ Released our demo video featuring lots of fun memes and reaction images converted into SVGs.
- [2025.11.08] ๐ We now offer a free trial API on our ๐ค HuggingFace Space.
- [2025.11.05] ๐ฅ We are honored to be featured as ๐ค HuggingFace Daily Paper #1.
Environment
git clone -b main --single-branch https://github.com/CSU-JPG/VCode.git
cd VCode
conda create -n vcode python=3.10.2 -y
conda activate vcode
conda install pytorch=2.5.1 torchvision=0.20.1 torchaudio=2.5.1 pytorch-cuda=12.4 -c pytorch -c nvidia
pip install -r requirements.txtVCode-suite is a comprehensive toolkit that automates the full image-to-SVG-to-render workflow. It includes both integrated pipelines and independent modules for generation, rendering, and revision. Users can either run the end-to-end pipelines for batch processing, or execute individual scripts for customized control.
๐ vcode-suite/
โโโ filter.py
โโโ img2svg.py
โโโ img2svgthinking.py
โโโ img2svg-w-visual-tool.py
โโโ img2text2svg.py
โโโ pipeline.sh
โโโ revision_pipeline.sh
โโโ revision.py
โโโ svg_render_img.py
๐ก Tip: The pipelines (
pipeline.sh,revision_pipeline.sh) perform fully automated batch processing, while the Python scripts (img2svg.py,img2text2svg.py,revision.py, etc.) can be run independently to support flexible and modular experimentation within the VCode framework.
pipeline.sh orchestrates the full image-to-SVG-to-render workflow.
It can connect to different generation modules โ img2svg, img2text2svg, or img2svgthinking โ to convert images into SVGs, then filter and render them into pixel images.
chmod +x pipeline.sh
./pipeline.shrevision_pipeline.sh automates the revision and optimization process.
It takes the previously generated SVGs (generated_svgs/) and rendered images (generated_imgs/), calls the API-based revision module, and outputs the optimized SVGs and renders to optimized_svgs/ and optimized_imgs/.
chmod +x revision_pipeline.sh
./revision_pipeline.shBoth generation and revision scripts can be executed independently for flexible and customized workflows.
Each core generation script โ img2svg.py, img2text2svg.py, img2svgthinking.py, and img2svg-w-visual-tool.py โ can directly convert input images into SVG code.
Similarly, revision.py can be run independently to optimize previously generated SVGs through visual feedback.
Run img2svg.py
python vcode-suite/img2svg.py \
/path/to/input_images \
./generated_svgs \
--model gpt-5 \
--base-url https://openrouter.ai/api/v1 \
--api-key <OPENROUTER_API_KEY> \
--max-tokens 16384| Argument | Type | Default | Description |
|---|---|---|---|
images_folder |
str | - | Path to the input folder containing image files. |
svg_output_folder |
str | - | Directory to save the generated SVG files. |
--model |
str | gpt-5 |
API model name used for conversion. |
--base-url |
str | https://openrouter.ai/api/v1 |
Base URL of the API endpoint. |
--api-key |
str | - | API key for authentication. |
--sleep |
int | 5 |
Seconds to wait between consecutive API calls. |
--max-tokens |
int | 16384 |
Maximum number of tokens allowed in the modelโs response. |
Run revision.py
python vcode-suite/revision.py \
--svg-folder ./generated_svgs \
--original-folder ./input_images \
--rendered-folder ./generated_imgs \
--output-folder ./optimized_svgs \
--analysis-folder ./visual_analysis \
--base-url https://openrouter.ai/api/v1 \
--api-key <OPENROUTER_API_KEY> \
--model gpt-5 \
--max-tokens 16384| Argument | Type | Default | Description |
|---|---|---|---|
--svg-folder |
str | โ | Root directory containing the SVG files to optimize. |
--svg-folder |
str | - | Root directory containing the SVG files to optimize. |
--original-folder |
str | - | Directory of the original reference images. |
--rendered-folder |
str | - | Directory of rendered images corresponding to the SVGs. |
--output-folder |
str | - | Directory to save the optimized SVG files. |
--analysis-folder |
str | - | Directory to save visual comparison and analysis txts. |
--base-url |
str | https://openrouter.ai/api/v1 |
Base URL of the API endpoint. |
--api-key |
str | - | API key. |
--model |
str | gpt-5 |
Model used for revision. |
--max-tokens |
int | 16384 |
Maximum tokens allowed in the model response. |
๐ก Tip: The
revision.pyscript refines existing SVGs based on visual comparison feedback, while generation scripts (img2svg.py,img2text2svg.py,img2svgthinking.py,img2svg-w-visual-tool.py) create SVGs from input images_folder. You can flexibly mix and match these tools depending on your pipeline needs.
Use the VCode-suite pipeline (or standalone scripts) to render images for each dataset.
Original images are already in data/:
- MM-Vet:
data/mm-vet/images - CV-Bench:
data/cv-bench - MMMU:
data/mmmu/mmmu_dev_processed_single_img_subset
Running your pipeline will produce, per dataset, a folder like:
generated_svgs/
generated_imgs/ โ used by the evaluators
Each evaluator is a shell script under evaluation/โฆ. They all follow the same usage:
chmod +x evaluation/mm-vet/mmvet_eval.sh
./evaluation/mm-vet/mmvet_eval.shchmod +x evaluation/cv-bench/cvbench_eval.sh
./evaluation/cv-bench/cvbench_eval.shchmod +x evaluation/mmmu/mmmu_eval.sh
./evaluation/mmmu/mmmu_eval.shThese scripts will read your generated_imgs/ and compute scores.
๐ก Reference: For directory organization and example script configuration, see
example_results/(it shows a working layout you can mirror).
Full Command with Options
python metrics.py \
--folder1 /path/to/reference_images \
--folder2 /path/to/model_outputs/gpt-4o \
--ckpt google/siglip2-so400m-patch14-384Command Line Arguments
| Argument | Required | Default | Description |
|---|---|---|---|
--folder1 |
โ Yes | - | Path to reference images folder |
--folder2 |
โ Yes | - | Path to model output folder (containing generated_imgs/ and generated_svgs/) |
--ckpt |
โ No | google/siglip2-so400m-patch14-384 |
SigLIP model checkpoint |
Expected Directory Layout:
Reference Images Folder (--folder1)
Location: data/mm-vet/images (example path - can be customized)
folder1/
โโโ category1/
โ โโโ image001.png
โ โโโ image002.jpg
โ โโโ ...
โโโ category2/
โ โโโ image003.png
โ โโโ ...
โโโ ...
Model Output Folder (--folder2)
Location: example_results/mm-vet/Gemini-2.5-Pro (example path - can be customized)
folder2/
โโโ generated_imgs/ # Generated/rendered images
โ โโโ category1/
โ โ โโโ image001.png
โ โ โโโ image002.jpg
โ โ โโโ ...
โ โโโ category2/
โ โ โโโ image003.png
โ โ โโโ ...
โ โโโ ...
โ
โโโ generated_svgs/ # SVG source files
โโโ category1/
โ โโโ image001.svg
โ โโโ image002.svg
โ โโโ ...
โโโ category2/
โ โโโ image003.svg
โ โโโ ...
โโโ ...
If you find our work useful, please cite:
@misc{vcode,
title={VCode: a Multimodal Coding Benchmark with SVG as Symbolic Visual Representation},
author={Kevin Qinghong Lin and Yuhao Zheng and Hangyu Ran and Dantong Zhu and Dongxing Mao and Linjie Li and Philip Torr and Alex Jinpeng Wang},
year={2025},
eprint={2511.02778},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2511.02778},
}