"AssertionError: Unsupported function type scaled_dot_product_attention" when trying to build Stable Diffusion

I was trying to build Stable Diffusion for other target than WebGPU (CPU and Vulkan), when I run the build script, I get this error when tracing the VAE:

Traceback (most recent call last):
  File ".../web-stable-diffusion/build.py", line 157, in <module>
    mod, params = trace_models(torch_dev_key)
  File ".../web-stable-diffusion/build.py", line 82, in trace_models
    vae = trace.vae_to_image(pipe)
  File ".../web-stable-diffusion/web_stable_diffusion/trace/model_trace.py", line 82, in vae_to_image
    mod = dynamo_capture_subgraphs(
 
...

  File ".../relax/python/tvm/relax/frontend/torch/dynamo.py", line 151, in _capture
    mod_ = from_fx(
  File ".../relax/python/tvm/relax/frontend/torch/fx_translator.py", line 1388, in from_fx
    return TorchFXImporter().from_fx(
  File ".../relax/python/tvm/relax/frontend/torch/fx_translator.py", line 1273, in from_fx
    func_name in self.convert_map
torch._dynamo.exc.BackendCompilerFailed: backend='_capture' raised:
AssertionError: Unsupported function type scaled_dot_product_attention


You can suppress this exception and fall back to eager by setting:
    import torch._dynamo
    torch._dynamo.config.suppress_errors = True

I built the mlc-ai/relax lib from source just to be sure and still have this error. It seems that the attention operation from pytorch fails to be traced because there is no TVM implementation. I was wondering how did you manage to build SD for WebGPU ? Do I have to provide the attention implementation myself ?

Well I managed to fix this error by turning off custom attention in the diffusers library : https://github.com/huggingface/diffusers/blob/fa9e35fca4f32436f4c6bb890a1b3dfcefa465f7/src/diffusers/models/attention.py#L75

But I’m getting another error when I building the model for LLVM/CPU target:

  File ".../web-stable-diffusion/build.py", line 136, in build
    ex = relax.build(mod_deploy, args.target)
  File ".../relax/python/tvm/relax/vm_build.py", line 337, in build
    return _vmlink(builder, target, tir_mod, ext_libs, params, system_lib=system_lib)
  File ".../python/tvm/relax/vm_build.py", line 242, in _vmlink
    lib = tvm.build(
  File ".../relax/python/tvm/driver/build_module.py", line 281, in build
    rt_mod_host = _driver_ffi.tir_to_runtime(annotated_mods, target_host)
  File ".../relax/python/tvm/_ffi/_ctypes/packed_func.py", line 237, in __call__
    raise get_last_ffi_error()
tvm._ffi.base.TVMError: Traceback (most recent call last):
  8: TVMFuncCall
  7: tvm::runtime::PackedFuncObj::Extractor<tvm::runtime::PackedFuncSubObj<tvm::runtime::TypedPackedFunc<tvm::runtime::Module (tvm::runtime::Map<tvm::Target, tvm::IRModule, void, void> const&, tvm::Target)>::AssignTypedLambda<tvm::__mk_TVM22::{lambda(tvm::runtime::Map<tvm::Target, tvm::IRModule, void, void> const&, tvm::Target)#1}>(tvm::__mk_TVM22::{lambda(tvm::runtime::Map<tvm::Target, tvm::IRModule, void, void> const&, tvm::Target)#1}, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >)::{lambda(tvm::runtime::TVMArgs const&, tvm::runtime::TVMRetValue*)#1}> >::Call(tvm::runtime::PackedFuncObj const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, tvm::runtime::TVMRetValue)
  6: tvm::TIRToRuntime(tvm::runtime::Map<tvm::Target, tvm::IRModule, void, void> const&, tvm::Target const&)
  5: tvm::codegen::Build(tvm::IRModule, tvm::Target)
  4: tvm::runtime::PackedFuncObj::Extractor<tvm::runtime::PackedFuncSubObj<tvm::runtime::TypedPackedFunc<tvm::runtime::Module (tvm::IRModule, tvm::Target)>::AssignTypedLambda<tvm::codegen::__mk_TVM0::{lambda(tvm::IRModule, tvm::Target)#1}>(tvm::codegen::__mk_TVM0::{lambda(tvm::IRModule, tvm::Target)#1}, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >)::{lambda(tvm::runtime::TVMArgs const&, tvm::runtime::TVMRetValue*)#1}> >::Call(tvm::runtime::PackedFuncObj const*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, tvm::runtime::TVMRetValue)
  3: tvm::codegen::LLVMModuleNode::Init(tvm::IRModule const&, tvm::Target const&)
  2: tvm::codegen::CodeGenCPU::AddFunction(tvm::tir::PrimFunc const&)
  1: tvm::codegen::CodeGenLLVM::AddFunctionInternal(tvm::tir::PrimFunc const&, bool)
  0: tvm::codegen::CodeGenLLVM::VisitStmt_(tvm::tir::AttrStmtNode const*)
  File ".../relax/src/target/llvm/codegen_llvm.cc", line 365
TVMError: not implemented

The line that crashes is the following:

llvm::Value* CodeGenLLVM::GetThreadIndex(const IterVar& iv) { LOG(FATAL) << "not implemented"; }

So I’m wondering, is it just not possible to build to CPU target ?

1 Like