[TE Compiler] Device type error in UpdateMainWorkspaceSize

comaniac · July 16, 2021, 7:21pm

Auto-scheduler failed to extract tasks with the heterogeneous target as reported by https://discuss.tvm.apache.org/t/can-autoscheduler-support-tuning-for-multiple-targets/1048. I made some investigation and found that we need to manually call target = relay.build_module.build_target_by_device_type_map(target) to first transform the target to be a dict. However, even with that change, I still got an error in the TE compiler:

github.com

apache/tvm/blob/main/src/relay/backend/te_compiler.cc#L480


    return (*it).second;
  } else {
    // heterogeneous execution.
    std::string call_dev_name;
    if (dev_type == 0) {
      call_dev_name = "llvm";
    } else {
      call_dev_name = runtime::DeviceName(dev_type);
    }
    if (targets.count(dev_type) == 0) {
      LOG(FATAL) << "No target is provided for device " << call_dev_name;
    }
    return targets[dev_type];
  }
}


/*!
 * \brief Update the "main" control function's metadata
 *
 * \param mod The module
 * \param targets Map of targets

The error is:

TVMError: No target is provided for device llvm

where the input targets for UpdateMainWorkspaceSize is:

1: llvm -keys=cpu -link-params=0
2: cuda -keys=cuda,gpu -max_num_threads=1024 -thread_warp_size=32

I have no idea why the TE compiler is looking for llvm in 0 instead of 1. The most weird thing is, this error won’t happen if we directly build the Relay module.

To reproduce, using this script target_debug.py · GitHub with my local branch: GitHub - comaniac/tvm at test_target

cc @jroesch @masahi @Annie

jroesch · July 20, 2021, 7:31pm

@comanic there is some strange code in there that uses 0 as a default iirc, I remember debugging some super familiar looking bug like this at some point in the past. I now can’t remember what the resolution was. I half remember changing this code, realizing some strange invariant and putting it back. One of my next goals is to clean up target handling in this piece of code as its currently a mess imo.

comaniac · July 20, 2021, 8:18pm

Got it. Thanks for the sharing and looking forward to your cleanup

masahi · July 24, 2021, 2:29am

I’ve also found that the number of extracted tasks seems different after the TE compiler PR. For example, previously TF2 ssd mobilenet v2 extracts 57 tasks, but now it has 130 tasks. Investigating…

masahi · July 24, 2021, 3:52am

Ok the reason is that identical workload appends a unique suffix to func_name like fused_nn_conv2d_add_16, while previously func_name is the same in identical workloads. So currently the lookup at tvm/relay_integration.py at 9c66587218dff190b384418d1d0e53d79dc8d288 · apache/tvm · GitHub always fails and all tasks end up having weight 1.

Apparently this only happens when VM compiler is used for task extraction.

comaniac · July 24, 2021, 4:27am

Hmm, do you know why only VM compiler has this issue?

masahi · July 24, 2021, 4:54am

It seems that the TE compiler PR [RFC]TECompiler: Staged refactor and removal of compile engine by csullivan · Pull Request #7518 · apache/tvm · GitHub only changed graph codegen to use the new TE compiler. VM still uses the old compile engine, but #7518 did change the VM compiler to make function names unique tvm/compile_engine.cc at 9c66587218dff190b384418d1d0e53d79dc8d288 · apache/tvm · GitHub. So now, VM task extraction fails because identical workloads have different func_name, causing workload look up to fail.

For graph codegen, it seems TE compiler doesn’t repeatedly call auto_schedule_topi_compute on the identical workloads. So each unique task is extracted only once, and weight is correctly updated by te_compiler_update_weights call. This doesn’t apply to VM codegen, where the same task is extracted multiple times. But since the key is different now, workload cache look up fails and identical tasks are returned as distinct tasks.

comaniac · July 24, 2021, 5:52pm

I see…I cannot recall clearly why we need to look up both function name and workload key, but I’ll try to find some time next week to take a look.

cc @merrymercy @jcf94

masahi · July 25, 2021, 2:35am

Yes, I believe a workload hash should be sufficient for the key, since previously the same workloads share the same func name anyway (but now func names are unique).

jcf94 · July 26, 2021, 6:47am

I’m not sure either.

Emm … I remember we’ve tried to share AutoScheduler’s result to workloads with different parameters?

comaniac · July 26, 2021, 5:39pm

Ah I found it. It’s not because of separating parameters but adding function name to the task description for debugging purpose:

github.com/apache/tvm

[AutoScheduler] Add task.desc for its function name

apache:main ← comaniac:ansor_func_name

opened 12:45AM - 03 Apr 21 UTC

comaniac

+27 -10

This PR adds an optional field `desc` to AutoScheduler's `SearchTask`. When extr…acting the tasks, this field will be used to store the Relay function name it was extracted from (e.g., `fused_nn.conv2d`). This would be useful if users only want to tune some tasks in a network. For example: ```python tasks, weights = extract_tasks(...) for task, weight in zip(tasks, weights): if "conv2d" not in task.desc: # throw away this task ``` This is also useful if users are interested in what function the compute DAG represents. cc @merrymercy @jcf94 @FrozenGene

I think we can remove function name from the extracted task key and simply show one of them for debugging. I’ll prepare a PR today.

comaniac · July 26, 2021, 8:09pm

PR filed

github.com/apache/tvm

Ansor fix task ext

apache:main ← comaniac:ansor_fix_task_ext

opened 08:08PM - 26 Jul 21 UTC

comaniac

+110 -73

Per discussion in https://discuss.tvm.apache.org/t/te-compiler-device-type-error…-in-updatemainworkspacesize/10536/5, the updated TE compiler has a different behavior as the old compile engine in terms of visiting each Relay function for compilation. Specifically, TE compiler now makes sure the function name is unique when it appears multiple times in a model. This breaks the current task extraction mechanism, which uses function name and TE compute hash as the key to de-duplicate tasks. As a result, the same task will be extracted multiple times, and all tasks have the weight 1. This PR makes the following changes to resolve this issue: 1. Change the type of `wkl_key_to_weight` from `Dict[str, int]` to `Dict[str, Tuple[int, Set[Str]]]`, mapping from a workload key to the weight as well as associated function names. In this way, we guarantee that functions with the same TE compute will be extracted once. 2. Add `func_name_to_wkl_key`, mapping from the unique function name to its workload key. When TE compiler is used and `te_compiler_update_weights` is invoked, it obtains the workload key of the given function and aggregate the weights. 3. Refactor unit tests to be pytest parameterized. Also improve the control flow test to have 2 identical Conv2Ds. Without this PR, this test will fail because it falls back to use VM compiler due to the control flow and extracts 2 identical tasks. cc @masahi @csullivan @jroesch @merrymercy

jroesch · July 28, 2021, 5:02pm

I will follow up directly on GitHub, I think we will need to iterate on task extraction as we clean up the lowering, etc. I am preparing an RFC on unifying the lowering.