-
Notifications
You must be signed in to change notification settings - Fork 29
Open
Description
i;m trying to quantize Gemma3 model with int4 weight only but failed:
(optimum-test) [[email protected] ~/optimum-executorch (main)]$ optimum-cli export executorch --model "google/gemma-3-4b-it" --task "multimodal-text-to-text" --recipe "cuda" --dtype bfloat16 --device cuda --max_seq_len 64 --output_dir int4-2 --qlinear_encoder 4w
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:09<00:00, 4.58s/it]
Traceback (most recent call last):
File "/home/gasoonjia/.conda/envs/optimum-test/bin/optimum-cli", line 7, in <module>
sys.exit(main())
File "/home/gasoonjia/.conda/envs/optimum-test/lib/python3.10/site-packages/optimum/commands/optimum_cli.py", line 208, in main
service.run()
File "/home/gasoonjia/.conda/envs/optimum-test/lib/python3.10/site-packages/optimum/commands/export/executorch.py", line 230, in run
main_export(
File "/home/gasoonjia/.conda/envs/optimum-test/lib/python3.10/site-packages/optimum/exporters/executorch/__main__.py", line 138, in main_export
model = task_func(model_name_or_path, **kwargs)
File "/home/gasoonjia/.conda/envs/optimum-test/lib/python3.10/site-packages/optimum/exporters/executorch/tasks/multimodal_text_to_text.py", line 226, in load_multimodal_text_to_text_model
quantize_model_(**quantize_encoder_kwargs)
File "/home/gasoonjia/.conda/envs/optimum-test/lib/python3.10/site-packages/optimum/exporters/executorch/quantization.py", line 108, in quantize_model_
quantize_(
File "/home/gasoonjia/.conda/envs/optimum-test/lib/python3.10/site-packages/torchao/quantization/quant_api.py", line 541, in quantize_
_replace_with_custom_fn_if_matches_filter(
File "/home/gasoonjia/.conda/envs/optimum-test/lib/python3.10/site-packages/torchao/quantization/quant_api.py", line 217, in _replace_with_custom_fn_if_matches_filter
new_child = _replace_with_custom_fn_if_matches_filter(
File "/home/gasoonjia/.conda/envs/optimum-test/lib/python3.10/site-packages/torchao/quantization/quant_api.py", line 217, in _replace_with_custom_fn_if_matches_filter
new_child = _replace_with_custom_fn_if_matches_filter(
File "/home/gasoonjia/.conda/envs/optimum-test/lib/python3.10/site-packages/torchao/quantization/quant_api.py", line 217, in _replace_with_custom_fn_if_matches_filter
new_child = _replace_with_custom_fn_if_matches_filter(
[Previous line repeated 3 more times]
File "/home/gasoonjia/.conda/envs/optimum-test/lib/python3.10/site-packages/torchao/quantization/quant_api.py", line 212, in _replace_with_custom_fn_if_matches_filter
model = replacement_fn(model, *extra_args)
File "/home/gasoonjia/.conda/envs/optimum-test/lib/python3.10/site-packages/torchao/quantization/quant_api.py", line 2233, in _intx_weight_only_transform
new_weight = _intx_weight_only_quantize_tensor(module.weight, config)
File "/home/gasoonjia/.conda/envs/optimum-test/lib/python3.10/site-packages/torchao/quantization/quant_api.py", line 2188, in _intx_weight_only_quantize_tensor
new_weight = IntxUnpackedToInt8Tensor.from_hp(
File "/home/gasoonjia/.conda/envs/optimum-test/lib/python3.10/site-packages/torchao/quantization/quantize_/workflows/intx/intx_unpacked_to_int8_tensor.py", line 185, in from_hp
scale, zero_point = choose_qparams_affine(
File "/home/gasoonjia/.conda/envs/optimum-test/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 124, in decorate_context
return func(*args, **kwargs)
File "/home/gasoonjia/.conda/envs/optimum-test/lib/python3.10/site-packages/torchao/quantization/quant_primitives.py", line 1238, in choose_qparams_affine
return _choose_qparams_affine(
File "/home/gasoonjia/.conda/envs/optimum-test/lib/python3.10/site-packages/torch/_ops.py", line 1251, in __call__
return self._op(*args, **kwargs)
File "/home/gasoonjia/.conda/envs/optimum-test/lib/python3.10/site-packages/torchao/quantization/quant_primitives.py", line 1546, in _choose_qparams_affine
shape_for_reduction, reduction_dims = _get_reduction_params(
File "/home/gasoonjia/.conda/envs/optimum-test/lib/python3.10/site-packages/torchao/quantization/quant_primitives.py", line 300, in _get_reduction_params
assert input_size[i] % block_size[i] == 0, (
AssertionError: Expecting input size at 1 dimension: 4304 to be divisible by block_size at 1 dimension: 32
i've tried to patch these two PRs:
patch 2 PRs:
but the error message is still the same.
Metadata
Metadata
Assignees
Labels
No labels