[VTA] Recursion error

acapone13 · September 9, 2020, 11:43am

Yes, after several attempt building and rebuilding I was able to make it work for the Ultra-96, haven’t been that lucky with the PYNQ-Z2 board. Thanks all the same for the quick response

tkbd · October 4, 2020, 1:12pm

Any update on this issue? I was trying to rebuild multiple times from pynq side, but could not get it to work

jadehsuu · October 10, 2020, 6:16am

open the runtime in config.cmake on board and re-run tvm.

kilsenp · December 13, 2020, 4:39pm

Hi I am still struggling with this issue:

Running a pynq-z2 with 2.6.0 On host and device using TVM v0.7.0 (728b82957)

Output on device: > INFO:root:Loading VTA library: /home/xilinx/tvm/build/libvta.so

INFO:RPCServer:load_module /tmp/tmpjsunht7f/conv2d.o           
INFO:root:Loading VTA library: /home/xilinx/tvm/build/libvta.so
INFO:RPCServer:load_module /tmp/tmpjsunht7f/conv2d.o           
INFO:root:Loading VTA library: /home/xilinx/tvm/build/libvta.so
INFO:RPCServer:load_module /tmp/tmpjsunht7f/conv2d.o           
Process Process-1:5:                                           
Traceback (most recent call last):                             
  File "/usr/lib/python3.6/multiprocessing/process.py", line 258, in _bootstrap
    self.run()                                                 
  File "/usr/lib/python3.6/multiprocessing/process.py", line 93, in run
    self._target(*self._args, **self._kwargs)                  
  File "/home/xilinx/tvm/python/tvm/rpc/server.py", line 118, in _serve_loop
    _ffi_api.ServerLoop(sockfd)                                
  File "/home/xilinx/tvm/python/tvm/_ffi/_ctypes/packed_func.py", line 237, in __call__
    raise get_last_ffi_error()                                 
AttributeError: Traceback (most recent call last):             
  [bt] (4) /home/xilinx/tvm/build/libtvm_runtime.so(TVMFuncCall+0x37) [0xb567280c]
  [bt] (3) /home/xilinx/tvm/build/libtvm_runtime.so(+0x898fc) [0xb56cd8fc]
  [bt] (2) /home/xilinx/tvm/build/libtvm_runtime.so(tvm::runtime::RPCServerLoop(int)+0x6b) [0xb56cd7a0]
  [bt] (1) /home/xilinx/tvm/build/libtvm_runtime.so(tvm::runtime::RPCEndpoint::ServerLoop()+0x125) [0xb56bb79a]
  [bt] (0) /home/xilinx/tvm/build/libtvm_runtime.so(+0x2c66c) [0xb567066c]
  File "/home/xilinx/tvm/python/tvm/_ffi/_ctypes/packed_func.py", line 81, in cfun
    rv = local_pyfunc(*pyargs)
  File "/home/xilinx/tvm/vta/python/vta/exec/rpc_server.py", line 85, in server_shutdown
    runtime_dll[0].VTARuntimeShutdown()
  File "/usr/lib/python3.6/ctypes/__init__.py", line 361, in __getattr__
    func = self.__getitem__(name)
  File "/usr/lib/python3.6/ctypes/__init__.py", line 366, in __getitem__
    func = self._FuncPtr((name_or_ordinal, self))
AttributeError: /home/xilinx/tvm/build/libvta.so: undefined symbol: VTARuntimeShutdown
INFO:RPCServer:connection from ('192.168.178.21', 55061)
INFO:root:Generating grammar tables from /usr/lib/python3.6/lib2to3/Grammar.txt
INFO:root:Generating grammar tables from /usr/lib/python3.6/lib2to3/PatternGrammar.txt
INFO:root:Program FPGA with 1x16_i8w8a32_15_15_18_17.bit
INFO:root:Skip reconfig_runtime due to same config.
INFO:root:Loading VTA library: /home/xilinx/tvm/build/libvta.so
INFO:RPCServer:load_module /tmp/tmptelh1dui/conv2d.o
INFO:root:Loading VTA library: /home/xilinx/tvm/build/libvta.so
INFO:root:Loading VTA library: /home/xilinx/tvm/build/libvta.so

Then repeated loads until crashing.

On host it’s the same error as before.

I rebuilt and cleaned multiple times without success. What exactly is the issue? Like why is make clean supposed to help?

denishem · January 12, 2021, 4:18pm

Hi, I managed to solve this on PYNQ-Z2 v2.6.0 and TVM 0.8.dev0.

On the PYNQ board

After a fresh git clone:

Define a home directory (I was using the Jupyter terminal this might be already set if you use ssh):

export HOME=/home/xilinx

Then add a couple of other variables to .bashrc

export TVM_HOME=/path/to/tvm
export PYTHONPATH=$TVM_HOME/python:$TVM_HOME/vta/python:$PYTHONPATH
export VTA_HW_PATH=$TVM_HOME/3rdparty/vta_hw

Then run source .bashrc

Finally, just follow https://tvm.apache.org/docs/vta/install.html#pynq-side-rpc-server-build-deployment

The important part is to build two times:

make runtime vta -j2
# FIXME (tmoreau89): remove this step by fixing the cmake build
make clean; make runtime vta -j2

test_benchmark_topi_conv2 and deploy_classification then work perfectly.

kilsenp · February 21, 2021, 9:50pm

Hi denishem, thanks for the response. I finally managed to get it to run. For me, I ran this at least 4 times, but wasn’t sure what the issue was that I tried different TVM versions and all.

However, the issue is not related with the TVM code (with 0.7.0). From my understanding, it has something to do with limited resources of the pynq that it cannot build the whole thing in one run. That’s why building multiple times helps until the whole vta lib can be built. For other people, you should see a bunch more

make[4]: Entering directory '/home/xilinx/tvm/build'
make[4]: Leaving directory '/home/xilinx/tvm/build'

than when building on the first try to be sure, that it’s working.

Still using 2.6.0 and v0.7.0

Finally, can get started with this

JUNSUNGKIM99 · March 27, 2021, 6:29am

Any update about this problem??

Arina · April 16, 2021, 9:21am

Hello @thierry,

I have the same problem with libvta.so building on de10nano target. After building with

$ make runtime vta -j2

or after rebuilding the runtime with the RPC server symbol VTARuntimeShutdown is not present at libvta.so (and suggested fix $ make clean; make runtime vta -j2 several times is not take effect):

VTARuntimeShutdown is in tvm/vta/runtime/runtime.cc, so I edited VTA.cmake, because it seems that for de10nano target runtime.cc finally have not been presented at FPGA_RUNTIME_SRCS (1 and 2 marks). With this fix libvta.so builds 2x larger than earlier and VTARuntimeShutdown is present.

Also it seems there is a typo in VTA.cmake (see 3rd mark) - FSIM_RUNTIME_SRCS, but it doesn’t matter, ${VTA_HW_PATH}/src doesn’t contain any .cc files

Since after the RPC server rebuilds the runtime, VTARuntimeShutdown also disappears from the libvta.so, I added a similar fix to \3rdparty\vta-hw\config\pkg_config.py:

Please tell me if the things I found are really errors in the VTA.cmake and pkg_config.py?

The fact is that now various tests fail with the same error. I doubt that the reason is in the how libvta was builded, but I still want to make sure.

(~$ python3 /home/arinan/tvm_main_1792837/vta/tutorials/test_program_rpc.py is successful, bitstream is successfully downloaded to FPGA)

There is error log on the host side:

~$ python3 /home/arinan/tvm_main_1792837/vta/tutorials/matrix_multiply.py 
primfn(A_1: handle, B_1: handle, C_1: handle) -> ()
  attr = {"global_symbol": "main", "tir.noalias": True}
  buffers = {C: Buffer(C_2: Pointer(int8), int8, [1, 16, 1, 16], []),
             A: Buffer(A_2: Pointer(int8), int8, [1, 16, 1, 16], []),
             B: Buffer(B_2: Pointer(int8), int8, [16, 16, 16, 16], [])}
  buffer_map = {A_1: A, B_1: B, C_1: C} {
  attr [A_buf: Pointer(int8)] "storage_scope" = "global";
  allocate(A_buf, int8, [256]);
  attr [B_buf: Pointer(int8)] "storage_scope" = "global";
  allocate(B_buf, int8, [65536]);
  attr [C_buf: Pointer(int32)] "storage_scope" = "global";
  allocate(C_buf, int32, [256]) {
    for (i1: int32, 0, 16) {
      for (i3: int32, 0, 16) {
        A_buf[((i1*16) + i3)] = (int8*)A_2[((i1*16) + i3)]
      }
    }
    for (i0: int32, 0, 16) {
      for (i1_1: int32, 0, 16) {
        for (i2: int32, 0, 16) {
          for (i3_1: int32, 0, 16) {
            B_buf[((((i0*4096) + (i1_1*256)) + (i2*16)) + i3_1)] = (int8*)B_2[((((i0*4096) + (i1_1*256)) + (i2*16)) + i3_1)]
          }
        }
      }
    }
    for (co: int32, 0, 16) {
      for (ci: int32, 0, 16) {
        C_buf[((co*16) + ci)] = 0
        for (ko: int32, 0, 16) {
          for (ki: int32, 0, 16) {
            C_buf[((co*16) + ci)] = ((int32*)C_buf[((co*16) + ci)] + (cast(int32, (int8*)A_buf[((ko*16) + ki)])*cast(int32, (int8*)B_buf[((((co*4096) + (ko*256)) + (ci*16)) + ki)])))
          }
        }
      }
    }
    for (i1_2: int32, 0, 16) {
      for (i3_2: int32, 0, 16) {
        C_2[((i1_2*16) + i3_2)] = cast(int8, (int32*)C_buf[((i1_2*16) + i3_2)])
      }
    }
  }
}


primfn(A_1: handle, B_1: handle, C_1: handle) -> ()
  attr = {"global_symbol": "main", "tir.noalias": True}
  buffers = {C: Buffer(C_2: Pointer(int8), int8, [1, 16, 1, 16], []),
             A: Buffer(A_2: Pointer(int8), int8, [1, 16, 1, 16], []),
             B: Buffer(B_2: Pointer(int8), int8, [16, 16, 16, 16], [])}
  buffer_map = {A_1: A, B_1: B, C_1: C} {
  attr [C_buf: Pointer(int32)] "storage_scope" = "local.acc_buffer";
  allocate(C_buf, int32, [256]);
  attr [A_buf: Pointer(int8)] "storage_scope" = "local.inp_buffer";
  allocate(A_buf, int8, [16]);
  attr [B_buf: Pointer(int8)] "storage_scope" = "local.wgt_buffer";
  allocate(B_buf, int8, [16]) {
    for (co: int32, 0, 16) {
      for (ci: int32, 0, 16) {
        C_buf[((co*16) + ci)] = 0
        for (ko: int32, 0, 16) {
          attr [IterVar(i0: int32, (nullptr), "DataPar", "")] "pragma_dma_copy" = 1;
          for (i3: int32, 0, 16) {
            A_buf[i3] = (int8*)A_2[((ko*16) + i3)]
          }
          attr [IterVar(i0_1: int32, (nullptr), "DataPar", "")] "pragma_dma_copy" = 1;
          for (i3_1: int32, 0, 16) {
            B_buf[i3_1] = (int8*)B_2[((((co*4096) + (ko*256)) + (ci*16)) + i3_1)]
          }
          for (ki: int32, 0, 16) {
            C_buf[((co*16) + ci)] = ((int32*)C_buf[((co*16) + ci)] + (cast(int32, (int8*)A_buf[ki])*cast(int32, (int8*)B_buf[ki])))
          }
        }
      }
    }
    attr [IterVar(i0_2: int32, (nullptr), "DataPar", "")] "pragma_dma_copy" = 1;
    for (i1: int32, 0, 16) {
      for (i3_2: int32, 0, 16) {
        C_2[((i1*16) + i3_2)] = cast(int8, (int32*)C_buf[((i1*16) + i3_2)])
      }
    }
  }
}


primfn(A_1: handle, B_1: handle, C_1: handle) -> ()
  attr = {"global_symbol": "main", "tir.noalias": True}
  buffers = {C: Buffer(C_2: Pointer(int8), int8, [1, 16, 1, 16], []),
             A: Buffer(A_2: Pointer(int8), int8, [1, 16, 1, 16], []),
             B: Buffer(B_2: Pointer(int8), int8, [16, 16, 16, 16], [])}
  buffer_map = {A_1: A, B_1: B, C_1: C} {
  attr [C_buf: Pointer(int32)] "storage_scope" = "local.acc_buffer";
  attr [A_buf: Pointer(int8)] "storage_scope" = "local.inp_buffer";
  attr [B_buf: Pointer(int8)] "storage_scope" = "local.wgt_buffer" {
    attr [IterVar(vta: int32, (nullptr), "ThreadIndex", "vta")] "coproc_scope" = 2 {
      attr [IterVar(vta, (nullptr), "ThreadIndex", "vta")] "coproc_uop_scope" = "VTAPushGEMMOp" {
        @tir.call_extern("VTAUopLoopBegin", 16, 1, 0, 0, dtype=int32)
        @tir.vta.uop_push(0, 1, 0, 0, 0, 0, 0, 0, dtype=int32)
        @tir.call_extern("VTAUopLoopEnd", dtype=int32)
      }
      @tir.vta.coproc_dep_push(2, 1, dtype=int32)
    }
    for (ko: int32, 0, 16) {
      attr [IterVar(vta, (nullptr), "ThreadIndex", "vta")] "coproc_scope" = 1 {
        @tir.vta.coproc_dep_pop(2, 1, dtype=int32)
        @tir.call_extern("VTALoadBuffer2D", @tir.tvm_thread_context(@tir.vta.command_handle(, dtype=handle), dtype=handle), A_2, ko, 1, 1, 1, 0, 0, 0, 0, 0, 2, dtype=int32)
        @tir.call_extern("VTALoadBuffer2D", @tir.tvm_thread_context(@tir.vta.command_handle(, dtype=handle), dtype=handle), B_2, ko, 1, 16, 16, 0, 0, 0, 0, 0, 1, dtype=int32)
        @tir.vta.coproc_dep_push(1, 2, dtype=int32)
      }
      attr [IterVar(vta, (nullptr), "ThreadIndex", "vta")] "coproc_scope" = 2 {
        @tir.vta.coproc_dep_pop(1, 2, dtype=int32)
        attr [IterVar(vta, (nullptr), "ThreadIndex", "vta")] "coproc_uop_scope" = "VTAPushGEMMOp" {
          @tir.call_extern("VTAUopLoopBegin", 16, 1, 0, 1, dtype=int32)
          @tir.vta.uop_push(0, 0, 0, 0, 0, 0, 0, 0, dtype=int32)
          @tir.call_extern("VTAUopLoopEnd", dtype=int32)
        }
        @tir.vta.coproc_dep_push(2, 1, dtype=int32)
      }
    }
    @tir.vta.coproc_dep_push(2, 3, dtype=int32)
    @tir.vta.coproc_dep_pop(2, 1, dtype=int32)
    attr [IterVar(vta, (nullptr), "ThreadIndex", "vta")] "coproc_scope" = 3 {
      @tir.vta.coproc_dep_pop(2, 3, dtype=int32)
      @tir.call_extern("VTAStoreBuffer2D", @tir.tvm_thread_context(@tir.vta.command_handle(, dtype=handle), dtype=handle), 0, 4, C_2, 0, 16, 1, 16, dtype=int32)
    }
    @tir.vta.coproc_sync(, dtype=int32)
  }
}


Traceback (most recent call last):
  File "/home/arinan/tvm_main_1792837/vta/tutorials/matrix_multiply.py", line 438, in <module>
    f(A_nd, B_nd, C_nd)
  File "/home/arinan/tvm_main_1792837/python/tvm/runtime/module.py", line 115, in __call__
    return self.entry_func(*args)
  File "/home/arinan/tvm_main_1792837/python/tvm/_ffi/_ctypes/packed_func.py", line 237, in __call__
    raise get_last_ffi_error()
tvm._ffi.base.TVMError: Traceback (most recent call last):
  7: TVMFuncCall
        at /home/arinan/tvm_main_1792837/src/runtime/c_runtime_api.cc:480
  6: tvm::runtime::PackedFunc::CallPacked(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) const
        at /home/arinan/tvm_main_1792837/include/tvm/runtime/packed_func.h:1150
  5: std::function<void (tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)>::operator()(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) const
        at /usr/include/c++/7/bits/std_function.h:706
  4: _M_invoke
        at /usr/include/c++/7/bits/std_function.h:316
  3: operator()
        at /home/arinan/tvm_main_1792837/src/runtime/rpc/rpc_module.cc:278
  2: tvm::runtime::RPCWrappedFunc::operator()(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*) const
        at /home/arinan/tvm_main_1792837/src/runtime/rpc/rpc_module.cc:127
  1: tvm::runtime::RPCClientSession::CallFunc(void*, TVMValue const*, int const*, int, std::function<void (tvm::runtime::TVMArgs)> const&)
        at /home/arinan/tvm_main_1792837/src/runtime/rpc/rpc_endpoint.cc:980
  0: tvm::runtime::RPCEndpoint::CallFunc(void*, TVMValue const*, int const*, int, std::function<void (tvm::runtime::TVMArgs)>)
        at /home/arinan/tvm_main_1792837/src/runtime/rpc/rpc_endpoint.cc:797
  File "/home/arinan/tvm_main_1792837/src/runtime/rpc/rpc_endpoint.cc", line 797
TVMError:
---------------------------------------------------------------
An internal invariant was violated during the execution of TVM.
Please read TVM's error reporting guidelines.
More details can be found here: https://discuss.tvm.ai/t/error-reporting/7793.
---------------------------------------------------------------
  Check failed: (code == RPCCode::kReturn) is false: code=1

There is a log from the target:

INFO:root:*************TARGET:'de10nano'
INFO:root:Skip reconfig_runtime due to same config.
INFO:root:Loading VTA library: /home/fpga/tvm_main_1792837/vta/python/vta/../../../build/libvta.so
INFO:RPCServer:load_module /tmp/tmpbtuyiox3/gemm.o
[18:17:30] /home/fpga/tvm_main_1792837/3rdparty/cma/cma_api_impl.h:154: Allocating 256 bytes of contigous memory

[18:17:30] /home/fpga/tvm_main_1792837/3rdparty/cma/cma_api_impl.h:154: Allocating 65536 bytes of contigous memory

[18:17:30] /home/fpga/tvm_main_1792837/3rdparty/cma/cma_api_impl.h:154: Allocating 256 bytes of contigous memory

[18:17:30] /home/fpga/tvm_main_1792837/3rdparty/cma/cma_api_impl.h:154: Allocating 33554432 bytes of contigous memory

[18:17:30] /home/fpga/tvm_main_1792837/3rdparty/cma/cma_api_impl.h:154: Allocating 33554432 bytes of contigous memory

jjyyjay · September 13, 2021, 3:18am

I followed the TVM and VTA installation guide and I encountered the same error when I ran test_benchmark_topi_conv2d.py and vta_get_started.py on pynq-z2 board. The error occured when accessing ctx(remote[0]:ext_dev(0)) in vta_get_started.py. It seems that tvm.rpc.RPCSession doesn’t construct extension device properly. For example, when I tried to access ctx attribute in vta_get_started.py(tvm.runtime.Device.exist), the recursion error occurred.

youxiudeshouyeren · January 20, 2022, 8:13am

I have solved the problem. The reason for the error is that the RPC server of the pynq device will rebuild libvta.so. On the pynq device , it’s gets the source files that builds the library from pkg_conf.py, but some wrong settings, such as incorrect environment variables, cannot get the correct source files. The library it builds will lack some symbols, resulting in the emergence of undefined symbols and infinite recursions.

The solution is to comment out RPC_ server. py to rebuild the code of the runtime library and run it manually each time.You need to run it manually when building libvta.so

eamicheal · February 14, 2022, 11:12am

Hello,

I installed Pynq v2.7 on my board and followed this tutorial (VTA Installation Guide — tvm 0.9.dev182+ge718f5a8a documentation), I am having the error “No Module Called Pynq” on the host side, can anyone help? It appears it is due to the venv on v2.7

singhae · March 7, 2023, 7:15am

Hi,

I have a same problem. “no module named pynq” on command ./test_program_rpc.py

Did you solve this problem?

If you don’t mind, could you please provide more information about the issue you are facing?

Kinds,

singhae · March 27, 2023, 5:01am

hi,

any help to fix the RPC server problem ?

pynq z1 board

tvm 12.0

pynq z1 v3.0.1

ubuntu20.04

RPCError: Error caught from RPC call: FileNotFoundError: [Errno 2] No such file or directory: ‘/tmp/tmpmbhpp83t/1x16_i8w8a32_15_15_18_17.bit’

thank you all!!!