[VTA] Are VTA testbench scripts fully rely on autotvm?

Hi there, Sorry for long post.

I am porting the VTA architecture to my custom FPGA system (not ZYNQ based, Ultrascale based FPGA + RPi combination) as my research project. VTA is really interesting and the system architecture giving me some ideas that I want to try implement to my system.

Anyways, now I have almost ported to the VTA to my system with referencing original Pynq VTA design (for this I was added some driver codes on pynq_driver and runtime) then the system almost can passed the test with scripts under ./integrations folder but still failed some tests. especially test with test_benchmark_topi_conv2d_transpose.py

To compare between Pynq and my custom system, I was run the test script on the genuine Pynq board with default configuration which described in the tutorial in TVM website. Then I noticed the test script is using autotvm mechanism to choose optimized configuration for VTA.

I think if I removed with autotvm.tophub.context(target): from the script (because these tuning parameters maybe invalid (or not best) for my system) but test should be passing correctly although it slow performance… So I tried disabling autotvm from the script then run them both pynq and Ultrascale. As result, I got fail with my Ultrascale I expected, maybe porting still incorrectly). But I am not sure why also fail with pynq with this error:

nyacom@xxxx:~/project/fic/pynq/tvm/vta/tests/python/integration$ python3 test_benchmark_topi_conv2d_transpose.py
Conv2DTransposeWorkload(batch=1, height=4, width=4, in_filter=1024, out_filter=512, hkernel=4, wkernel=4, hpad=1, wpad=1, hstride=2, wstride=2, o_hpad=0, o_wpad=0)
Cannot find config for target=ext_dev -keys=vta,cpu -device=vta -model=pynq_1x16_i8w8a32_15_15_18_17, workload=('conv2d_transpose_packed.vta', ('TENSOR', (1, 64, 4, 4, 1, 16), 'int8'), ('TENSOR', (32, 64, 4, 4, 16, 16), 'int8'), (2, 2), (1, 1, 1, 1), 'int32', (0, 0)). A fallback configuration is used, which may bring great performance regression.
Traceback (most recent call last):
  File "test_benchmark_topi_conv2d_transpose.py", line 304, in <module>
    test_conv2d_transpose(device="vta")
  File "test_benchmark_topi_conv2d_transpose.py", line 299, in test_conv2d_transpose
    vta.testing.run(_run)
  File "/home/hlab/nyacom/project/fic/pynq/tvm/vta/python/vta/testing/utils.py", line 74, in run
    run_func(env, remote)
  File "test_benchmark_topi_conv2d_transpose.py", line 297, in _run
    run_conv2d_transpose(env, remote, wl, target)
  File "test_benchmark_topi_conv2d_transpose.py", line 257, in run_conv2d_transpose
    cost = time_f(data_arr, kernel_arr, res_arr)
  File "/home/hlab/nyacom/project/fic/pynq/tvm/python/tvm/runtime/module.py", line 226, in evaluator
    blob = feval(*args)
  File "/home/hlab/nyacom/project/fic/pynq/tvm/python/tvm/_ffi/_ctypes/packed_func.py", line 237, in __call__
    raise get_last_ffi_error()
tvm.error.RPCError: Traceback (most recent call last):
  [bt] (8) /home/wasmii2/usr/nyacom/project/fic/pynq/tvm/build/libtvm.so(+0x11b8383) [0x7f41dcdb9383]
  [bt] (7) /home/wasmii2/usr/nyacom/project/fic/pynq/tvm/build/libtvm.so(tvm::runtime::RPCWrappedFunc::operator()(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) const+0x1f5) [0x7f41dcdbdd95]
  [bt] (6) /home/wasmii2/usr/nyacom/project/fic/pynq/tvm/build/libtvm.so(tvm::runtime::RPCClientSession::CallFunc(void*, TVMValue const*, int const*, int, std::function<void (tvm::runtime::TVMArgs)> const&)+0x43) [0x7f41dcdb1b53]
  [bt] (5) /home/wasmii2/usr/nyacom/project/fic/pynq/tvm/build/libtvm.so(tvm::runtime::RPCEndpoint::CallFunc(void*, TVMValue const*, int const*, int, std::function<void (tvm::runtime::TVMArgs)>)+0x38b) [0x7f41dcda854b]
  [bt] (4) /home/wasmii2/usr/nyacom/project/fic/pynq/tvm/build/libtvm.so(tvm::runtime::RPCEndpoint::HandleUntilReturnEvent(bool, std::function<void (tvm::runtime::TVMArgs)>)+0x2cb) [0x7f41dcda72bb]
  [bt] (3) /home/wasmii2/usr/nyacom/project/fic/pynq/tvm/build/libtvm.so(tvm::runtime::RPCEndpoint::EventHandler::HandleNextEvent(bool, bool, std::function<void (tvm::runtime::TVMArgs)>)+0x96) [0x7f41dcdb1996]
  [bt] (2) /home/wasmii2/usr/nyacom/project/fic/pynq/tvm/build/libtvm.so(tvm::runtime::RPCEndpoint::EventHandler::HandleProcessPacket(std::function<void (tvm::runtime::TVMArgs)>)+0x109) [0x7f41dcdb1799]
  [bt] (1) /home/wasmii2/usr/nyacom/project/fic/pynq/tvm/build/libtvm.so(tvm::runtime::RPCEndpoint::EventHandler::HandleReturn(tvm::runtime::RPCCode, std::function<void (tvm::runtime::TVMArgs)>)+0x11a) [0x7f41dcdb15ca]
  [bt] (0) /home/wasmii2/usr/nyacom/project/fic/pynq/tvm/build/libtvm.so(+0x11a52e7) [0x7f41dcda62e7]
  [bt] (8) /home/wasmii2/usr/nyacom/project/fic/pynq/tvm_pynq/build/libtvm_runtime.so(+0x82c84) [0xb593ec84]
  [bt] (7) /home/wasmii2/usr/nyacom/project/fic/pynq/tvm_pynq/build/libtvm_runtime.so(+0x829c2) [0xb593e9c2]
  [bt] (6) /home/wasmii2/usr/nyacom/project/fic/pynq/tvm_pynq/build/libtvm_runtime.so(+0x41f00) [0xb58fdf00]
  [bt] (5) /tmp/tmpvy_5m0z_/conv2d_transpose.o.so(conv2d_transpose+0x43c) [0xb51dfcec]
  [bt] (4) /tmp/tmpvy_5m0z_/conv2d_transpose.o.so(+0x1318) [0xb51e0318]
  [bt] (3) /home/wasmii2/usr/nyacom/project/fic/pynq/tvm_pynq/vta/python/vta/../../../build/libvta.so(VTAPushGEMMOp+0x847) [0xb61ab218]
  [bt] (2) /tmp/tmpvy_5m0z_/conv2d_transpose.o.so(+0x159c) [0xb51e059c]
  [bt] (1) /home/wasmii2/usr/nyacom/project/fic/pynq/tvm_pynq/vta/python/vta/../../../build/libvta.so(VTAUopPush+0xc1) [0xb61a866a]
  [bt] (0) /home/wasmii2/usr/nyacom/project/fic/pynq/tvm_pynq/vta/python/vta/../../../build/libvta.so(+0x4eb2) [0xb61a6eb2]
  File "/home/hlab/nyacom/project/fic/pynq/tvm/src/runtime/rpc/rpc_endpoint.cc", line 378
RPCError: Error caught from RPC call:
[16:34:27] /home/hlab/nyacom/project/fic/pynq/tvm_pynq/vta/runtime/runtime.cc:258: Check failed: seq_[i].dst_idx != dst_index:

The result means test scripts are fully rely on autotvm? It wont be run without tuning parameters…??

Thanks.

Finally resolve the issue myself. It was memory coherency problem. neither autotvm and testbench.

Because of my architecture, memory between PS part and PL part is not coherent. (Physically separated) So I believe the VTA runtime is supporting non-coherent mode because the mode switch is in pkg_config.py and -DVTA_COHERENT_ACCESSES macro.

But I think if I enable this switch, the current VTA runtime seems will not synchronize the memory area of PS and PL in something situation or condition… I think this behavior wont be a problem on Zynq architecture. So basically nobody facing this problem except me.

1 Like