gen-image3.0

a powerful, large-scale, multimodal model for Text-to-Image generation.

🎨 gen-image3.0

A large-scale, multimodal Text-to-Image generation model — fully open-source and commercial-grade.

📖 Introduction

gen-image3.0 is a state-of-the-art AI model that generates high-quality images from textual descriptions. It uses a unified autoregressive multimodal framework, meaning it deeply understands both text and image data to create visually compelling outputs.

In simple words:

What it is: An AI model that turns text prompts into images.
Who made it: Your team (inspired by open-source breakthroughs).
Why it matters: It’s large, intelligent, accurate in rendering details (including text), and fully open-source for commercial use.

✨ Key Features

Unified Multimodal Architecture: Integrates text and image modalities for contextually rich image generation.
Largest Open-Source Image Generation Model: ~80 billion parameters with a Mixture-of-Experts (MoE) design (13B active per token).
World-Knowledge Reasoning: Can intelligently fill missing details using common sense.
Ultra-Long Prompt Understanding: Handles text prompts over 1,000 characters for fine-grained scene control.
Accurate Text Rendering: Supports precise generation of titles, logos, annotations, and multilingual text.
Commercial Use: Fully open-source for developers and businesses (some geographic restrictions may apply).

💻 System Requirements

Due to its size, gen-image3.0 requires high-end hardware:

GPU Memory: ≥3 × 80GB VRAM (4 × 80GB recommended, e.g., NVIDIA A100/H100)
Disk Space: 170GB for model weights
Operating System: Linux with CUDA 12.8

📦 Environment Setup

Python: 3.12+
PyTorch: 2.7.1 with CUDA 12.8
Install Dependencies:

pip install torch==2.7.1 torchvision==0.22.1 torchaudio==2.7.1 --index-url https://download.pytorch.org/whl/cu128
pip install -r requirements.txt

Optional Performance Optimizations:

pip install flash-attn==2.8.3 --no-build-isolation
pip install flashinfer-python

⚡ Tip: Ensure PyTorch CUDA version matches system CUDA. First inference with FlashInfer may be slower (~10 min) due to kernel compilation.

🚀 Usage

Quick Start with Transformers

from transformers import AutoModelForCausalLM

model_id = "./gen-image3"

kwargs = dict(
    attn_implementation="sdpa",    # "flash_attention_2" if installed
    trust_remote_code=True,
    torch_dtype="auto",
    device_map="auto",
    moe_impl="eager",              # "flashinfer" if installed
)

model = AutoModelForCausalLM.from_pretrained(model_id, **kwargs)
model.load_tokenizer(model_id)

prompt = "A brown and white dog is running on the grass"
image = model.generate_image(prompt=prompt, stream=True)
image.save("image.png")

Local Installation & Usage

git clone https://github.com/kantkrishan0206-crypto/gen-image3.0.git
cd gen-image3.0
# Download weights from HuggingFace or your storage
# Run demo
python3 run_image_gen.py --model-id ./gen-image3 --prompt "Your prompt here"

Command-line arguments:

Argument	Description	Default
--prompt	Input text prompt	(Required)
--model-id	Model path	(Required)
--attn-impl	Attention type: sdpa / flash_attention_2	sdpa
--moe-impl	MoE type: eager / flashinfer	eager
--image-size	Image resolution	auto
--save	Output image path	image.png

🎨 Interactive Gradio Demo

Install Gradio:

pip install gradio>=4.21.0

Configure Environment:

export MODEL_ID="path/to/your/model"
export GPUS="0,1,2,3"
export HOST="0.0.0.0"
export PORT="443"

Launch Demo:

sh run_app.sh --moe-impl flashinfer --attn-impl flash_attention_2

Open Web Interface: http://localhost:443

📝 Prompt Guide

Manual Prompts: Describe main subject first, then environment, style, perspective, lighting, and technical parameters.
System Prompts: Prebuilt templates can automatically enhance user inputs for better results.

📊 Evaluation

Machine Evaluation (SSAE): Scores images against text prompts using semantic alignment metrics.
Human Evaluation (GSB): Professionals rate image quality using Good/Same/Bad comparison method.

🙏 Acknowledgements

We thank the open-source community for invaluable contributions:

🤗 Transformers
🎨 Diffusers
🌐 HuggingFace
⚡ FlashAttention
🚀 FlashInfer

⭐ If you like this project, give it a star!

Name		Name	Last commit message	Last commit date
Latest commit History 41 Commits
PE		PE
app		app
assets		assets
gen-image3.0		gen-image3.0
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
requirment.txt		requirment.txt
run_app.sh		run_app.sh
run_image_gen.py		run_image_gen.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

gen-image3.0

🎨 gen-image3.0

📖 Introduction

✨ Key Features

💻 System Requirements

📦 Environment Setup

🚀 Usage

Quick Start with Transformers

Local Installation & Usage

🎨 Interactive Gradio Demo

📝 Prompt Guide

📊 Evaluation

🙏 Acknowledgements

About

Uh oh!

Releases

Packages

Languages

License

kantkrishan0206-crypto/gen-image3.0

Folders and files

Latest commit

History

Repository files navigation

gen-image3.0

🎨 gen-image3.0

📖 Introduction

✨ Key Features

💻 System Requirements

📦 Environment Setup

🚀 Usage

Quick Start with Transformers

Local Installation & Usage

🎨 Interactive Gradio Demo

📝 Prompt Guide

📊 Evaluation

🙏 Acknowledgements

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages