Skip to content

Can not export Gemma3 model with int4 weight only #180

@Gasoonjia

Description

@Gasoonjia

i;m trying to quantize Gemma3 model with int4 weight only but failed:

(optimum-test) [[email protected] ~/optimum-executorch (main)]$ optimum-cli export executorch             --model "google/gemma-3-4b-it"             --task "multimodal-text-to-text"             --recipe "cuda"             --dtype bfloat16             --device cuda             --max_seq_len 64             --output_dir int4-2 --qlinear_encoder 4w
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:09<00:00,  4.58s/it]
Traceback (most recent call last):
  File "/home/gasoonjia/.conda/envs/optimum-test/bin/optimum-cli", line 7, in <module>
    sys.exit(main())
  File "/home/gasoonjia/.conda/envs/optimum-test/lib/python3.10/site-packages/optimum/commands/optimum_cli.py", line 208, in main
    service.run()
  File "/home/gasoonjia/.conda/envs/optimum-test/lib/python3.10/site-packages/optimum/commands/export/executorch.py", line 230, in run
    main_export(
  File "/home/gasoonjia/.conda/envs/optimum-test/lib/python3.10/site-packages/optimum/exporters/executorch/__main__.py", line 138, in main_export
    model = task_func(model_name_or_path, **kwargs)
  File "/home/gasoonjia/.conda/envs/optimum-test/lib/python3.10/site-packages/optimum/exporters/executorch/tasks/multimodal_text_to_text.py", line 226, in load_multimodal_text_to_text_model
    quantize_model_(**quantize_encoder_kwargs)
  File "/home/gasoonjia/.conda/envs/optimum-test/lib/python3.10/site-packages/optimum/exporters/executorch/quantization.py", line 108, in quantize_model_
    quantize_(
  File "/home/gasoonjia/.conda/envs/optimum-test/lib/python3.10/site-packages/torchao/quantization/quant_api.py", line 541, in quantize_
    _replace_with_custom_fn_if_matches_filter(
  File "/home/gasoonjia/.conda/envs/optimum-test/lib/python3.10/site-packages/torchao/quantization/quant_api.py", line 217, in _replace_with_custom_fn_if_matches_filter
    new_child = _replace_with_custom_fn_if_matches_filter(
  File "/home/gasoonjia/.conda/envs/optimum-test/lib/python3.10/site-packages/torchao/quantization/quant_api.py", line 217, in _replace_with_custom_fn_if_matches_filter
    new_child = _replace_with_custom_fn_if_matches_filter(
  File "/home/gasoonjia/.conda/envs/optimum-test/lib/python3.10/site-packages/torchao/quantization/quant_api.py", line 217, in _replace_with_custom_fn_if_matches_filter
    new_child = _replace_with_custom_fn_if_matches_filter(
  [Previous line repeated 3 more times]
  File "/home/gasoonjia/.conda/envs/optimum-test/lib/python3.10/site-packages/torchao/quantization/quant_api.py", line 212, in _replace_with_custom_fn_if_matches_filter
    model = replacement_fn(model, *extra_args)
  File "/home/gasoonjia/.conda/envs/optimum-test/lib/python3.10/site-packages/torchao/quantization/quant_api.py", line 2233, in _intx_weight_only_transform
    new_weight = _intx_weight_only_quantize_tensor(module.weight, config)
  File "/home/gasoonjia/.conda/envs/optimum-test/lib/python3.10/site-packages/torchao/quantization/quant_api.py", line 2188, in _intx_weight_only_quantize_tensor
    new_weight = IntxUnpackedToInt8Tensor.from_hp(
  File "/home/gasoonjia/.conda/envs/optimum-test/lib/python3.10/site-packages/torchao/quantization/quantize_/workflows/intx/intx_unpacked_to_int8_tensor.py", line 185, in from_hp
    scale, zero_point = choose_qparams_affine(
  File "/home/gasoonjia/.conda/envs/optimum-test/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 124, in decorate_context
    return func(*args, **kwargs)
  File "/home/gasoonjia/.conda/envs/optimum-test/lib/python3.10/site-packages/torchao/quantization/quant_primitives.py", line 1238, in choose_qparams_affine
    return _choose_qparams_affine(
  File "/home/gasoonjia/.conda/envs/optimum-test/lib/python3.10/site-packages/torch/_ops.py", line 1251, in __call__
    return self._op(*args, **kwargs)
  File "/home/gasoonjia/.conda/envs/optimum-test/lib/python3.10/site-packages/torchao/quantization/quant_primitives.py", line 1546, in _choose_qparams_affine
    shape_for_reduction, reduction_dims = _get_reduction_params(
  File "/home/gasoonjia/.conda/envs/optimum-test/lib/python3.10/site-packages/torchao/quantization/quant_primitives.py", line 300, in _get_reduction_params
    assert input_size[i] % block_size[i] == 0, (
AssertionError: Expecting input size at 1 dimension: 4304 to be divisible by block_size at 1 dimension: 32

i've tried to patch these two PRs:

patch 2 PRs:

#178

#165

but the error message is still the same.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions