We use auto schedule to turn our model on X84-64 CPU, which is saved as onnx-format and imported by relay.
When the model is FP32, every thing is fine.
But when the model was converted to INT8 by relay.quantize.qconfig, we have received an error "Cannot find tuned schedules for target=llvm -keys=cpu -link-params=0 -mcpu=tigerlake". It seems that the hash key of the task extracted by auto-scheduler is not the same as that of the task to compile.
We try to add disabled_pass={"AutoSchedulerLayoutRewrite"} in model compilation as below,
The existing tuto scheduler doesn’t support int8 optimization. For example, on tigerlake you cannot use VNNI with auto scheduler.
But the next iteration of our auto scheduling system is being developed specifically with exploitation of HW-specific intrinsics in mind. Last week we landed initial support for “auto scheduling with VNNI”, see https://github.com/apache/tvm/pull/11088 and the integration test for int8 BERT.
It is correct assertion. In the same time execution of neural network in int8 mode on any Intel CPU can give up to 2x speedup. VNNI gives up to 4x.
@jinfagang you can try to see perf boost even on Core i7 using AutoTVM with proper target. I.e. target = "llvm -mcpu=core-avx2" for Core(TM) i7-10700. x86 conv2d schedules are developed with int8 intrinsics for SSE4.2/AVX2/AVX512/VNNI. Another note - int8 was not enabled for all platforms for fully connected layer yet (aka matmul/dense).
@elvin-n Hi, I recently have a model searched using ansor but the speed is very slow, even slower than eigen (the model simply some matmul and laynorm etc, very basic matrix calculation).
Could that because of I only set target = llvm when search? Without any specs there.
BTW, how do I know what specs I should specific after llvm am not very familliar with llvm itself or embeded params like avx etc.
Unortunately Ansor will not be able to generate efficient x86 int8 execution. Above int8 code can be generated only with AutoTVM (so far)
it is not enough. There should be "llvm -mcpu=core-avx2" at least. The full list of mcpu can be taken from here depending on target architecture/isa.
as I mentioned above efficient int8 is enable on SSE/AVX2/AVX512 only for conv2d. dense requires hardware having VNNI. I.e. if your topology have conv2d mostly - you must see significant perf gain after AutoTVM and codegen with proper target