I’ve been following the tutorial here: https://web.tvm.apache.org/docs/how_to/tune_with_autotvm/tune_relay_x86.html to auto-tune a convolutional network for x86 CPUs, but I ran into an issue during the graph-level tuning phase.
The error I’m getting is:
Config for target=llvm -keys=cpu, workload=('dense_nopack.x86', ('TENSOR', (1, 512), 'float32'), ('TENSOR', (1000, 512), 'float32'), None, 'float32')
is missing in ApplyGraphBest context. A fallback configuration is used, which may bring great performance regression.
This happens when evaluating the tuned model at the graph level, resulting in only a slight improvement over the kernel-level tuning. The fallback configuration seems to be causing performance regression.
Here’s what I’ve tried so far:
- Verified the
log_file
andgraph_opt_sch_file
paths. - Checked that all workloads were extracted and included during tuning.
- Adjusted the
number
andrepeat
parameters inautotvm.LocalRunner
.
Despite these steps, the issue persists. Has anyone encountered this problem? How can I ensure the correct configurations are applied to avoid fallback scenarios during graph-level tuning?
Any advice would be greatly appreciated.