Hi, I’m new in tvm, today I use tvm to speed up inference for model from onnx format. But after that, model inference slower than the original model 5x time:
original model onnx: 0.7795 s
model building with tvm: 3.4420 s
model after tuning: 2.2411 s
Some code:
TARGET = "llvm"
st_onnx = time.time()
onnx_model = onnx.load(onnx_path)
input_name = "input"
shape_dict = {input_name: image_infos.shape}
mod, params = relay.frontend.from_onnx(onnx_model, shape_dict)
print("Time load onnx to tvm: {:0.4f}".format(time.time() - st_onnx))
with tvm.transform.PassContext(opt_level=3):
lib = relay.build(mod, target=TARGET, params=params)
model_module = graph_executor.GraphModule(lib["default"](tvm.device(str(TARGET), 0)))
model_module.set_input("input", image_infos)
model_module.run()
tvm_output = model_module.get_output(0)
# Tuning
def tune(mod, params, X_ex):
number = 10
repeat = 1
min_repeat_ms = 0 # since we're tuning on a CPU, can be set to 0
timeout = 10 # in seconds
# create a TVM runner
runner = autotvm.LocalRunner(
number=number,
repeat=repeat,
timeout=timeout,
min_repeat_ms=min_repeat_ms,
)
tuning_option = {
"tuner": "xgb",
"trials": 10,
"early_stopping": 100,
"measure_option": autotvm.measure_option(
builder=autotvm.LocalBuilder(build_func="default"), runner=runner
),
"tuning_records": "fied_extraction-autotuning.json",
}
tasks = autotvm.task.extract_from_program(mod["main"], target=TARGET, params=params)
for i, task in enumerate(tasks):
prefix = "[Task %2d/%2d] " % (i + 1, len(tasks))
tuner_obj = XGBTuner(task, loss_type="rank")
tuner_obj.tune(
n_trial=min(tuning_option["trials"], len(task.config_space)),
early_stopping=tuning_option["early_stopping"],
measure_option=tuning_option["measure_option"],
callbacks=[
autotvm.callback.progress_bar(tuning_option["trials"], prefix=prefix),
autotvm.callback.log_to_file(tuning_option["tuning_records"]),
],
)
return
tune(mod, params, image_infos)
I don’t know why is that
Is something I missing !?
Cpu Info:
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 12
On-line CPU(s) list: 0-11
Thread(s) per core: 2
Core(s) per socket: 6
Socket(s): 1
NUMA node(s): 1
Vendor ID: GenuineIntel
CPU family: 6
Model: 165
Model name: Intel(R) Core(TM) i5-10500 CPU @ 3.10GHz
Stepping: 3
CPU MHz: 1114.296
CPU max MHz: 4500,0000
CPU min MHz: 800,0000
BogoMIPS: 6199.99
Virtualization: VT-x
L1d cache: 32K
L1i cache: 32K
L2 cache: 256K
L3 cache: 12288K
NUMA node0 CPU(s): 0-11
This is Comet Lake CPU
I’m using instruction to install tvm:
#!/bin/bash
set -ex
# https://tvm.apache.org/docs/install/from_source.html#install-from-source
if [[ ! -d "/tmp/tvm" ]]; then
git clone --recursive https://github.com/apache/tvm /tmp/tvm
fi
apt-get update && \
apt-get install -y gcc libtinfo-dev zlib1g-dev \
build-essential cmake libedit-dev libxml2-dev \
llvm-6.0 \
libgomp1 # S0#61786308
if [[ ! -d "/tmp/tvm/build" ]]; then
mkdir /tmp/tvm/build
fi
cp /tmp/tvm/cmake/config.cmake /tmp/tvm/build
mv /tmp/tvm/build/config.cmake /tmp/tvm/build/~config.cmake && \
cat /tmp/tvm/build/~config.cmake | \
sed -E "s|set\(USE_GRAPH_RUNTIME OFF\)|set\(USE_GRAPH_RUNTIME ON\)|" | \
sed -E "s|set\(USE_GRAPH_RUNTIME_DEBUG OFF\)|set\(USE_GRAPH_RUNTIME_DEBUG ON\)|" | \
sed -E "s|set\(USE_LLVM OFF\)|set\(USE_LLVM /usr/bin/llvm-config-6.0\)|" > \
/tmp/tvm/build/config.cmake
cd /tmp/tvm/build && cmake .. && make -j4
cd /tmp/tvm/python && /usr/local/envs/tvm/bin/python setup.py install --user && cd ..