Nothing Special   »   [go: up one dir, main page]

Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

量化模型如何加载lora #580

Open
2 tasks
mrlihellohorld opened this issue Dec 4, 2024 · 4 comments
Open
2 tasks

量化模型如何加载lora #580

mrlihellohorld opened this issue Dec 4, 2024 · 4 comments
Assignees

Comments

@mrlihellohorld
Copy link

System Info / 系統信息

ubuntu

Information / 问题信息

  • The official example scripts / 官方的示例脚本
  • My own modified scripts / 我自己修改的脚本和任务

Reproduction / 复现过程

1

Expected behavior / 期待表现

模型经过量化: import torch
from diffusers import AutoencoderKLCogVideoX, CogVideoXTransformer3DModel, CogVideoXImageToVideoPipeline
from diffusers.utils import export_to_video, load_image
from transformers import T5EncoderModel
from torchao.quantization import quantize_, int8_weight_only

    quantization = int8_weight_only
    text_encoder = T5EncoderModel.from_pretrained(model_path, subfolder="text_encoder",
                                                torch_dtype=torch.bfloat16)
    quantize_(text_encoder, quantization())

    transformer = CogVideoXTransformer3DModel.from_pretrained(model_path, subfolder="transformer",
                                                            torch_dtype=torch.bfloat16)
    quantize_(transformer, quantization())

    vae = AutoencoderKLCogVideoX.from_pretrained(model_path, subfolder="vae", torch_dtype=torch.bfloat16)
    quantize_(vae, quantization())

    # Create pipeline and run inference
    self.pipe = CogVideoXImageToVideoPipeline.from_pretrained(
        model_path,
        text_encoder=text_encoder,
        transformer=transformer,
        vae=vae,
        torch_dtype=torch.bfloat16,
    )
    # if lora_path:
    #     pipe.load_lora_weights(lora_path, weight_name="pytorch_lora_weights.safetensors", adapter_name="test_1")
    #     pipe.fuse_lora(lora_scale=1 / lora_rank)

    self.pipe.to(self.device)
    #pipe.enable_model_cpu_offload()
    self.pipe.vae.enable_tiling()
    self.pipe.vae.enable_slicing()

在load lora时候报错:
if lora_path:
self.pipe.load_lora_weights(lora_path, weight_name="pytorch_lora_weights.safetensors", adapter_name="zoom in")
self.pipe.fuse_lora(lora_scale=1 / lora_rank)
报错信息:Traceback (most recent call last):
File "", line 1, in
File "/code/ace/CogVideo/smart_g_image_to_video_sobey/app/src/I2VExtensionPackage/Cogvideox_demo.py", line 74, in dynamic_lora
self.pipe.fuse_lora(lora_scale=1 / lora_rank)
File "/code/ace/CogVideo/smart_g_image_to_video_sobey/app/src/I2VExtensionPackage/diffusers/loaders/lora_pipeline.py", line 2646, in fuse_lora
super().fuse_lora(
File "/code/ace/CogVideo/smart_g_image_to_video_sobey/app/src/I2VExtensionPackage/diffusers/loaders/lora_base.py", line 459, in fuse_lora
model.fuse_lora(lora_scale, safe_fusing=safe_fusing, adapter_names=adapter_names)
File "/code/ace/CogVideo/smart_g_image_to_video_sobey/app/src/I2VExtensionPackage/diffusers/loaders/peft.py", line 567, in fuse_lora
self.apply(partial(self.fuse_lora_apply, adapter_names=adapter_names))
File "/root/miniconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 895, in apply
module.apply(fn)
File "/root/miniconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 895, in apply
module.apply(fn)
File "/root/miniconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 895, in apply
module.apply(fn)
[Previous line repeated 1 more time]
File "/root/miniconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 896, in apply
fn(self)
File "/code/ace/CogVideo/smart_g_image_to_video_sobey/app/src/I2VExtensionPackage/diffusers/loaders/peft.py", line 589, in fuse_lora_apply
module.merge(**merge_kwargs)
File "/root/miniconda3/lib/python3.8/site-packages/peft/tuners/lora/layer.py", line 483, in merge
base_layer.weight.data += delta_weight
File "/root/miniconda3/lib/python3.8/site-packages/torchao/dtypes/utils.py", line 56, in dispatch__torch_function

return func(*args, **kwargs)
File "/root/miniconda3/lib/python3.8/site-packages/torchao/dtypes/utils.py", line 70, in dispatch__torch_dispatch

raise NotImplementedError(f"{cls.name} dispatch: attempting to run unimplemented operator/function: {func}")
NotImplementedError: AffineQuantizedTensor dispatch: attempting to run unimplemented operator/function: aten.add
.Tensor

@zRzRzRzRzRzRzR
Copy link
Member

这个版本的模型我的理解是无法挂lora的,目前fp8的显存消耗和BF16差不多,建议使用BF16 + cpu offload的方式

@zRzRzRzRzRzRzR zRzRzRzRzRzRzR self-assigned this Dec 4, 2024
@mrlihellohorld
Copy link
Author

这个版本的模型我的理解是无法挂lora的,目前fp8的显存消耗和BF16差不多,建议使用BF16 + cpu offload的方式

CogVideoX1.5-5b-I2V模型如果没有被量化的情况下,是可以挂载lora的。现在就是量化之后没有办法和lora merge,数据格式不匹配。

@zRzRzRzRzRzRzR
Copy link
Member

哦我说的这个版本是FP8,BF16能正常的,fp8会出现你说的这个问题,我们还没处理过这个问题,目前人手有限,这个地方的尝试可能暂时没办法支持

@GSK666
Copy link
GSK666 commented Dec 17, 2024

哦我说的这个版本是FP8,BF16能正常的,fp8会出现你说的这个问题,我们还没处理过这个问题,目前人手有限,这个地方的尝试可能暂时没办法支持

请问一下是否可以先挂载lora再进行量化?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants