[TE Compiler] Device type error in UpdateMainWorkspaceSize

Auto-scheduler failed to extract tasks with the heterogeneous target as reported by https://discuss.tvm.apache.org/t/can-autoscheduler-support-tuning-for-multiple-targets/1048. I made some investigation and found that we need to manually call target = relay.build_module.build_target_by_device_type_map(target) to first transform the target to be a dict. However, even with that change, I still got an error in the TE compiler:

The error is:

TVMError: No target is provided for device llvm

where the input targets for UpdateMainWorkspaceSize is:

1: llvm -keys=cpu -link-params=0
2: cuda -keys=cuda,gpu -max_num_threads=1024 -thread_warp_size=32

I have no idea why the TE compiler is looking for llvm in 0 instead of 1. The most weird thing is, this error wonā€™t happen if we directly build the Relay module.

To reproduce, using this script target_debug.py Ā· GitHub with my local branch: GitHub - comaniac/tvm at test_target

cc @jroesch @masahi @Annie

@comanic there is some strange code in there that uses 0 as a default iirc, I remember debugging some super familiar looking bug like this at some point in the past. I now canā€™t remember what the resolution was. I half remember changing this code, realizing some strange invariant and putting it back. One of my next goals is to clean up target handling in this piece of code as its currently a mess imo.

Got it. Thanks for the sharing and looking forward to your cleanup :slight_smile:

Iā€™ve also found that the number of extracted tasks seems different after the TE compiler PR. For example, previously TF2 ssd mobilenet v2 extracts 57 tasks, but now it has 130 tasks. Investigatingā€¦

Ok the reason is that identical workload appends a unique suffix to func_name like fused_nn_conv2d_add_16, while previously func_name is the same in identical workloads. So currently the lookup at tvm/relay_integration.py at 9c66587218dff190b384418d1d0e53d79dc8d288 Ā· apache/tvm Ā· GitHub always fails and all tasks end up having weight 1.

Apparently this only happens when VM compiler is used for task extraction.

Hmm, do you know why only VM compiler has this issue?

It seems that the TE compiler PR [RFC]TECompiler: Staged refactor and removal of compile engine by csullivan Ā· Pull Request #7518 Ā· apache/tvm Ā· GitHub only changed graph codegen to use the new TE compiler. VM still uses the old compile engine, but #7518 did change the VM compiler to make function names unique tvm/compile_engine.cc at 9c66587218dff190b384418d1d0e53d79dc8d288 Ā· apache/tvm Ā· GitHub. So now, VM task extraction fails because identical workloads have different func_name, causing workload look up to fail.

For graph codegen, it seems TE compiler doesnā€™t repeatedly call auto_schedule_topi_compute on the identical workloads. So each unique task is extracted only once, and weight is correctly updated by te_compiler_update_weights call. This doesnā€™t apply to VM codegen, where the same task is extracted multiple times. But since the key is different now, workload cache look up fails and identical tasks are returned as distinct tasks.

I seeā€¦I cannot recall clearly why we need to look up both function name and workload key, but Iā€™ll try to find some time next week to take a look.

cc @merrymercy @jcf94

1 Like

Yes, I believe a workload hash should be sufficient for the key, since previously the same workloads share the same func name anyway (but now func names are unique).

Iā€™m not sure either.

Emm ā€¦ I remember weā€™ve tried to share AutoSchedulerā€™s result to workloads with different parameters?

Ah I found it. Itā€™s not because of separating parameters but adding function name to the task description for debugging purpose:

I think we can remove function name from the extracted task key and simply show one of them for debugging. Iā€™ll prepare a PR today.

PR filed

I will follow up directly on GitHub, I think we will need to iterate on task extraction as we clean up the lowering, etc. I am preparing an RFC on unifying the lowering.