量化模型如何加载lora #580

mrlihellohorld · 2024-12-04T07:37:07Z

System Info / 系統信息

ubuntu

Information / 问题信息

The official example scripts / 官方的示例脚本
My own modified scripts / 我自己修改的脚本和任务

Reproduction / 复现过程

1

Expected behavior / 期待表现

模型经过量化： import torch
from diffusers import AutoencoderKLCogVideoX, CogVideoXTransformer3DModel, CogVideoXImageToVideoPipeline
from diffusers.utils import export_to_video, load_image
from transformers import T5EncoderModel
from torchao.quantization import quantize_, int8_weight_only

    quantization = int8_weight_only
    text_encoder = T5EncoderModel.from_pretrained(model_path, subfolder="text_encoder",
                                                torch_dtype=torch.bfloat16)
    quantize_(text_encoder, quantization())

    transformer = CogVideoXTransformer3DModel.from_pretrained(model_path, subfolder="transformer",
                                                            torch_dtype=torch.bfloat16)
    quantize_(transformer, quantization())

    vae = AutoencoderKLCogVideoX.from_pretrained(model_path, subfolder="vae", torch_dtype=torch.bfloat16)
    quantize_(vae, quantization())

    # Create pipeline and run inference
    self.pipe = CogVideoXImageToVideoPipeline.from_pretrained(
        model_path,
        text_encoder=text_encoder,
        transformer=transformer,
        vae=vae,
        torch_dtype=torch.bfloat16,
    )
    # if lora_path:
    #     pipe.load_lora_weights(lora_path, weight_name="pytorch_lora_weights.safetensors", adapter_name="test_1")
    #     pipe.fuse_lora(lora_scale=1 / lora_rank)

    self.pipe.to(self.device)
    #pipe.enable_model_cpu_offload()
    self.pipe.vae.enable_tiling()
    self.pipe.vae.enable_slicing()

在load lora时候报错：
if lora_path:
self.pipe.load_lora_weights(lora_path, weight_name="pytorch_lora_weights.safetensors", adapter_name="zoom in")
self.pipe.fuse_lora(lora_scale=1 / lora_rank)
报错信息：Traceback (most recent call last):
File "", line 1, in
File "/code/ace/CogVideo/smart_g_image_to_video_sobey/app/src/I2VExtensionPackage/Cogvideox_demo.py", line 74, in dynamic_lora
self.pipe.fuse_lora(lora_scale=1 / lora_rank)
File "/code/ace/CogVideo/smart_g_image_to_video_sobey/app/src/I2VExtensionPackage/diffusers/loaders/lora_pipeline.py", line 2646, in fuse_lora
super().fuse_lora(
File "/code/ace/CogVideo/smart_g_image_to_video_sobey/app/src/I2VExtensionPackage/diffusers/loaders/lora_base.py", line 459, in fuse_lora
model.fuse_lora(lora_scale, safe_fusing=safe_fusing, adapter_names=adapter_names)
File "/code/ace/CogVideo/smart_g_image_to_video_sobey/app/src/I2VExtensionPackage/diffusers/loaders/peft.py", line 567, in fuse_lora
self.apply(partial(self.fuse_lora_apply, adapter_names=adapter_names))
File "/root/miniconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 895, in apply
module.apply(fn)
File "/root/miniconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 895, in apply
module.apply(fn)
File "/root/miniconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 895, in apply
module.apply(fn)
[Previous line repeated 1 more time]
File "/root/miniconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 896, in apply
fn(self)
File "/code/ace/CogVideo/smart_g_image_to_video_sobey/app/src/I2VExtensionPackage/diffusers/loaders/peft.py", line 589, in fuse_lora_apply
module.merge(**merge_kwargs)
File "/root/miniconda3/lib/python3.8/site-packages/peft/tuners/lora/layer.py", line 483, in merge
base_layer.weight.data += delta_weight
File "/root/miniconda3/lib/python3.8/site-packages/torchao/dtypes/utils.py", line 56, in dispatch__torch_function
return func(*args, **kwargs)
File "/root/miniconda3/lib/python3.8/site-packages/torchao/dtypes/utils.py", line 70, in dispatch__torch_dispatch
raise NotImplementedError(f"{cls.name} dispatch: attempting to run unimplemented operator/function: {func}")
NotImplementedError: AffineQuantizedTensor dispatch: attempting to run unimplemented operator/function: aten.add.Tensor

The text was updated successfully, but these errors were encountered:

zRzRzRzRzRzRzR · 2024-12-04T14:08:12Z

这个版本的模型我的理解是无法挂lora的，目前fp8的显存消耗和BF16差不多，建议使用BF16 + cpu offload的方式

mrlihellohorld · 2024-12-05T01:57:17Z

这个版本的模型我的理解是无法挂lora的，目前fp8的显存消耗和BF16差不多，建议使用BF16 + cpu offload的方式

CogVideoX1.5-5b-I2V模型如果没有被量化的情况下，是可以挂载lora的。现在就是量化之后没有办法和lora merge，数据格式不匹配。

zRzRzRzRzRzRzR · 2024-12-07T09:48:47Z

哦我说的这个版本是FP8，BF16能正常的，fp8会出现你说的这个问题，我们还没处理过这个问题，目前人手有限，这个地方的尝试可能暂时没办法支持

GSK666 · 2024-12-17T08:49:53Z

哦我说的这个版本是FP8，BF16能正常的，fp8会出现你说的这个问题，我们还没处理过这个问题，目前人手有限，这个地方的尝试可能暂时没办法支持

请问一下是否可以先挂载lora再进行量化？

zRzRzRzRzRzRzR self-assigned this Dec 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

量化模型如何加载lora #580

量化模型如何加载lora #580

量化模型如何加载lora #580

量化模型如何加载lora #580

Comments

System Info / 系統信息

Information / 问题信息

Reproduction / 复现过程

Expected behavior / 期待表现