Skip to content

CSU-JPG/VCode

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

67 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

๐ŸŽจ VCode: SVG as Symbolic Visual Representation

TL;DR: SVG code as a Visual Representation

Overview

See our demo video for fun!

VCode_demo_video.mp4

๐Ÿ“ฃ News

  • [2025.11.08] ๐ŸŒŸ Added Gemini-3-Pro to our benchmark, showing excellent performance.
  • [2025.11.08] ๐ŸŽฅ Released our demo video featuring lots of fun memes and reaction images converted into SVGs.
  • [2025.11.08] ๐Ÿš€ We now offer a free trial API on our ๐Ÿค— HuggingFace Space.
  • [2025.11.05] ๐Ÿ”ฅ We are honored to be featured as ๐Ÿค— HuggingFace Daily Paper #1.

๐Ÿ“‹ Table of Contents


๐Ÿ› ๏ธ Installation

Environment

git clone -b main --single-branch https://github.com/CSU-JPG/VCode.git
cd VCode
conda create -n vcode python=3.10.2 -y
conda activate vcode
conda install pytorch=2.5.1 torchvision=0.20.1 torchaudio=2.5.1 pytorch-cuda=12.4 -c pytorch -c nvidia
pip install -r requirements.txt

๐Ÿš€ Quick Start

๐Ÿงฉ VCode-suite

VCode-suite is a comprehensive toolkit that automates the full image-to-SVG-to-render workflow. It includes both integrated pipelines and independent modules for generation, rendering, and revision. Users can either run the end-to-end pipelines for batch processing, or execute individual scripts for customized control.

๐Ÿ“ vcode-suite/
โ”œโ”€โ”€ filter.py
โ”œโ”€โ”€ img2svg.py
โ”œโ”€โ”€ img2svgthinking.py
โ”œโ”€โ”€ img2svg-w-visual-tool.py
โ”œโ”€โ”€ img2text2svg.py
โ”œโ”€โ”€ pipeline.sh
โ”œโ”€โ”€ revision_pipeline.sh
โ”œโ”€โ”€ revision.py
โ””โ”€โ”€ svg_render_img.py

๐Ÿ’ก Tip: The pipelines (pipeline.sh, revision_pipeline.sh) perform fully automated batch processing, while the Python scripts (img2svg.py, img2text2svg.py, revision.py, etc.) can be run independently to support flexible and modular experimentation within the VCode framework.

โš™๏ธ Usage

1๏ธโƒฃ Generate and render SVGs

pipeline.sh orchestrates the full image-to-SVG-to-render workflow. It can connect to different generation modules โ€” img2svg, img2text2svg, or img2svgthinking โ€” to convert images into SVGs, then filter and render them into pixel images.

chmod +x pipeline.sh
./pipeline.sh

2๏ธโƒฃ Optimize generated SVGs

revision_pipeline.sh automates the revision and optimization process. It takes the previously generated SVGs (generated_svgs/) and rendered images (generated_imgs/), calls the API-based revision module, and outputs the optimized SVGs and renders to optimized_svgs/ and optimized_imgs/.

chmod +x revision_pipeline.sh
./revision_pipeline.sh

3๏ธโƒฃ Run scripts independently

Both generation and revision scripts can be executed independently for flexible and customized workflows.

Each core generation script โ€” img2svg.py, img2text2svg.py, img2svgthinking.py, and img2svg-w-visual-tool.py โ€” can directly convert input images into SVG code. Similarly, revision.py can be run independently to optimize previously generated SVGs through visual feedback.


Run img2svg.py

python vcode-suite/img2svg.py \
/path/to/input_images \
./generated_svgs \
--model gpt-5 \
--base-url https://openrouter.ai/api/v1 \
--api-key <OPENROUTER_API_KEY> \
--max-tokens 16384
Argument Type Default Description
images_folder str - Path to the input folder containing image files.
svg_output_folder str - Directory to save the generated SVG files.
--model str gpt-5 API model name used for conversion.
--base-url str https://openrouter.ai/api/v1 Base URL of the API endpoint.
--api-key str - API key for authentication.
--sleep int 5 Seconds to wait between consecutive API calls.
--max-tokens int 16384 Maximum number of tokens allowed in the modelโ€™s response.

Run revision.py

python vcode-suite/revision.py \
--svg-folder ./generated_svgs \
--original-folder ./input_images \
--rendered-folder ./generated_imgs \
--output-folder ./optimized_svgs \
--analysis-folder ./visual_analysis \
--base-url https://openrouter.ai/api/v1 \
--api-key <OPENROUTER_API_KEY> \
--model gpt-5 \
--max-tokens 16384
Argument Type Default Description
--svg-folder str โ€” Root directory containing the SVG files to optimize.
--svg-folder str - Root directory containing the SVG files to optimize.
--original-folder str - Directory of the original reference images.
--rendered-folder str - Directory of rendered images corresponding to the SVGs.
--output-folder str - Directory to save the optimized SVG files.
--analysis-folder str - Directory to save visual comparison and analysis txts.
--base-url str https://openrouter.ai/api/v1 Base URL of the API endpoint.
--api-key str - API key.
--model str gpt-5 Model used for revision.
--max-tokens int 16384 Maximum tokens allowed in the model response.

๐Ÿ’ก Tip: The revision.py script refines existing SVGs based on visual comparison feedback, while generation scripts (img2svg.py, img2text2svg.py, img2svgthinking.py, img2svg-w-visual-tool.py) create SVGs from input images_folder. You can flexibly mix and match these tools depending on your pipeline needs.


๐Ÿ”ฎ Evaluation

โš™๏ธ Usage

1๏ธโƒฃ Generate IMGs for all three datasets

Use the VCode-suite pipeline (or standalone scripts) to render images for each dataset. Original images are already in data/:

  • MM-Vet: data/mm-vet/images
  • CV-Bench: data/cv-bench
  • MMMU: data/mmmu/mmmu_dev_processed_single_img_subset

Running your pipeline will produce, per dataset, a folder like:

generated_svgs/
generated_imgs/  โ† used by the evaluators

2๏ธโƒฃ Run each datasetโ€™s evaluator

Each evaluator is a shell script under evaluation/โ€ฆ. They all follow the same usage:

chmod +x evaluation/mm-vet/mmvet_eval.sh
./evaluation/mm-vet/mmvet_eval.sh
chmod +x evaluation/cv-bench/cvbench_eval.sh
./evaluation/cv-bench/cvbench_eval.sh
chmod +x evaluation/mmmu/mmmu_eval.sh
./evaluation/mmmu/mmmu_eval.sh

These scripts will read your generated_imgs/ and compute scores.

๐Ÿ’ก Reference: For directory organization and example script configuration, see example_results/ (it shows a working layout you can mirror).


3๏ธโƒฃ Calculate each datasetโ€™s metrics

Full Command with Options

python metrics.py \
--folder1 /path/to/reference_images \
--folder2 /path/to/model_outputs/gpt-4o \
--ckpt google/siglip2-so400m-patch14-384

Command Line Arguments

Argument Required Default Description
--folder1 โœ… Yes - Path to reference images folder
--folder2 โœ… Yes - Path to model output folder (containing generated_imgs/ and generated_svgs/)
--ckpt โŒ No google/siglip2-so400m-patch14-384 SigLIP model checkpoint

Expected Directory Layout:

Reference Images Folder (--folder1)

Location: data/mm-vet/images (example path - can be customized)

folder1/
โ”œโ”€โ”€ category1/
โ”‚   โ”œโ”€โ”€ image001.png
โ”‚   โ”œโ”€โ”€ image002.jpg
โ”‚   โ””โ”€โ”€ ...
โ”œโ”€โ”€ category2/
โ”‚   โ”œโ”€โ”€ image003.png
โ”‚   โ””โ”€โ”€ ...
โ””โ”€โ”€ ...

Model Output Folder (--folder2)

Location: example_results/mm-vet/Gemini-2.5-Pro (example path - can be customized)

folder2/
โ”œโ”€โ”€ generated_imgs/           # Generated/rendered images
โ”‚   โ”œโ”€โ”€ category1/
โ”‚   โ”‚   โ”œโ”€โ”€ image001.png
โ”‚   โ”‚   โ”œโ”€โ”€ image002.jpg
โ”‚   โ”‚   โ””โ”€โ”€ ...
โ”‚   โ”œโ”€โ”€ category2/
โ”‚   โ”‚   โ”œโ”€โ”€ image003.png
โ”‚   โ”‚   โ””โ”€โ”€ ...
โ”‚   โ””โ”€โ”€ ...
โ”‚
โ””โ”€โ”€ generated_svgs/           # SVG source files
   โ”œโ”€โ”€ category1/
   โ”‚   โ”œโ”€โ”€ image001.svg
   โ”‚   โ”œโ”€โ”€ image002.svg
   โ”‚   โ””โ”€โ”€ ...
   โ”œโ”€โ”€ category2/
   โ”‚   โ”œโ”€โ”€ image003.svg
   โ”‚   โ””โ”€โ”€ ...
   โ””โ”€โ”€ ...

๐Ÿ“Œ Citation

If you find our work useful, please cite:

@misc{vcode,
      title={VCode: a Multimodal Coding Benchmark with SVG as Symbolic Visual Representation}, 
      author={Kevin Qinghong Lin and Yuhao Zheng and Hangyu Ran and Dantong Zhu and Dongxing Mao and Linjie Li and Philip Torr and Alex Jinpeng Wang},
      year={2025},
      eprint={2511.02778},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2511.02778}, 
}

About

VCode: SVG as Symbolic Visual Representation

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published