Running Stable Diffusion fully in browser with WebGPU with TVM Unity

Hello community,

We are excited to share with folks about the project we released today: Web Stable Diffusion, the world’s first stable diffusion completely running on the browser. Stable diffusion models are heavy and usually take a server to run. While this time, we bring stable diffusion models to the browser side completely. We have a runnable demo that you can try out on the website. And you are welcome to check out our GitHub repo for more details.

This project is not possible without the open-source ecosystems, especially Apache TVM and TVM Unity. We want to thank everyone of the Apache TVM Unity efforts so far. We are excited to see the concept of TVM Unity making changes, bringing machine learning closer to people, and enabling even more opportunities. The Web Stable Diffusion project is of great excitement to us, and we believe that it is starting point of the long journey throughout our development, experiences and pushes for Apache TVM Unity.

9 Likes

This is a great demonstration of what we can do with unity. Would love to see how it also enables other incremental deployment process, such as bringing things to mobile devices.

We are happy to present our work at TVMCon today! Here is the Jupyter notebook that can give you a full view of how we make it with the help of TVM Unity. It contains what we went through about model import in the tutorial “Introduction to Unity: Relax and PyTorch” at TVMCon, and also very detailed guidance on build and deployment. Everyone is welcome to check out!

4 Likes

Hello, I’m trying to trace model to relax, when tracing vae model , I got this error:“unsupported scaled_dot_product_attention op”. I’m using diffusers 0.16 and torch 2.0.0+cpu , It seems that torch or diffusers fuse some op directly. How do you overcome this issue? when i print the symbolic traced graph , I noticed one line "scaled_dot_product_attention = torch._C._nn.scaled_dot_product_attention(permute, permute_1, permute_2, dropout_p =0.0, is_causal = False); "

2 Likes

I try to compile sd model from hugging face.It occurs the same problem as you.

Currently, I’m trying to use the meta_schedule module to get an optimized model. Sadly, it won’t work, after tuning, when I call relax.build to, it shows the typical Did you forget to bind? error. You can find detailed info here: https://github.com/mlc-ai/web-stable-diffusion/issues/43.

By the way, the DefaultGPUSchedule works.

Default gpu scheduler is very slow ,you can use meta scheduler to auto tune your model, some times meta schedule failed to find valid schedule for some op just like softmax ,this can cause failed to bind error