The value of each transformerblock in the infer phase is particularly large #573

lgs00 · 2024-12-03T02:34:34Z

When I infer with cogvideoX-1.5-t2v, I found an interesting thing, after the processing ofself.norm1 = CogVideoXLayerNormZero, the value of norm_hidden_states was particularly large. What is the reason? In this line of diffusers:(https://github.com/huggingface/diffusers/blob/30f2e9bd202c89bb3862c8ada470d0d1ac8ee0e5/src/diffusers/models/transformers/cogvideox_transformer_3d.py#L127)
norm_hidden_states, norm_encoder_hidden_states, gate_msa, enc_gate_msa = self.norm1(hidden_states, encoder_hidden_states, temb)

The text was updated successfully, but these errors were encountered:

lgs00 · 2024-12-03T06:58:51Z

but when I infer at cogvideoX1.0-5b, norm_hidden_states look normal. I wonder why the scale shrinks a lot in 1.5, while the result of 1.0 is more consistent with our cognition. Curious what causes this change, norm_hidden_states shouldn't normally be above 100.
The result of 1.0 is as follows:

lgs00 closed this as completed Dec 3, 2024

lgs00 reopened this Dec 3, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The value of each transformerblock in the infer phase is particularly large #573

The value of each transformerblock in the infer phase is particularly large #573

The value of each transformerblock in the infer phase is particularly large #573

The value of each transformerblock in the infer phase is particularly large #573

Comments