Hi,
This is really a very good repo for learning stable diffusion from scratch. However, I found the missing scaling factor that should have been applied to latent $z$ before U-Net. It was said to keep the variance of the latent onto a unit circle which could facilitate training. A detailed discussion can be found at:
huggingface/diffusers#437
Cheers,
Liyan