Memory allocation check failing with custom pynq fpga bit stream

rsahu · November 6, 2020, 6:58pm

Hi,

Sorry I am new to TVM/VTA and this issue may be a naive one. But regardless any info would be helpful I am trying execute the default conv2d Resnet benchmark on a custom vta bit streams on my zedboard fpga which I have flashed with pynq2.4 , How should I be changing the default vta_config.json, is there a systematic way I should follow. Some of my config variations are working but others are giving me error such as:

Run1:

{ “TARGET” : “pynq”, “HW_VER” : “0.0.1”, “LOG_INP_WIDTH” : 3, “LOG_WGT_WIDTH” : 3, “LOG_ACC_WIDTH” : 5, “LOG_BATCH” : 0, “LOG_BLOCK” : 4, “LOG_UOP_BUFF_SIZE” : 15, “LOG_INP_BUFF_SIZE” : 16, “LOG_WGT_BUFF_SIZE” : 18, “LOG_ACC_BUFF_SIZE” : 17 }

Output:

Conv2DWorkload(batch=1, height=56, width=56, in_filter=64, out_filter=64, hkernel=3, wkernel=3, hpad=1, wpad=1, hstride=1, wstride=1) VTA CONV2D TEST PASSED: Time cost = 0.00597205 sec/op, 38.7155 GOPS Conv2DWorkload(batch=1, height=56, width=56, in_filter=64, out_filter=128, hkernel=3, wkernel=3, hpad=1, wpad=1, hstride=2, wstride=2) VTA CONV2D TEST PASSED: Time cost = 0.00312467 sec/op, 36.9977 GOPS Conv2DWorkload(batch=1, height=56, width=56, in_filter=64, out_filter=128, hkernel=1, wkernel=1, hpad=0, wpad=0, hstride=2, wstride=2) VTA CONV2D TEST PASSED: Time cost = 0.00150457 sec/op, 8.53734 GOPS Conv2DWorkload(batch=1, height=28, width=28, in_filter=128, out_filter=128, hkernel=3, wkernel=3, hpad=1, wpad=1, hstride=1, wstride=1) VTA CONV2D TEST PASSED: Time cost = 0.00532538 sec/op, 43.4168 GOPS Conv2DWorkload(batch=1, height=28, width=28, in_filter=128, out_filter=256, hkernel=3, wkernel=3, hpad=1, wpad=1, hstride=2, wstride=2) VTA CONV2D TEST PASSED: Time cost = 0.00275541 sec/op, 41.9559 GOPS Conv2DWorkload(batch=1, height=28, width=28, in_filter=128, out_filter=256, hkernel=1, wkernel=1, hpad=0, wpad=0, hstride=2, wstride=2) VTA CONV2D TEST PASSED: Time cost = 0.0011002 sec/op, 11.6752 GOPS Conv2DWorkload(batch=1, height=14, width=14, in_filter=256, out_filter=256, hkernel=3, wkernel=3, hpad=1, wpad=1, hstride=1, wstride=1) VTA CONV2D TEST PASSED: Time cost = 0.00501686 sec/op, 46.0868 GOPS Conv2DWorkload(batch=1, height=14, width=14, in_filter=256, out_filter=512, hkernel=3, wkernel=3, hpad=1, wpad=1, hstride=2, wstride=2) VTA CONV2D TEST PASSED: Time cost = 0.00394636 sec/op, 29.2942 GOPS Conv2DWorkload(batch=1, height=14, width=14, in_filter=256, out_filter=512, hkernel=1, wkernel=1, hpad=0, wpad=0, hstride=2, wstride=2) VTA CONV2D TEST PASSED: Time cost = 0.00111623 sec/op, 11.5075 GOPS Conv2DWorkload(batch=1, height=7, width=7, in_filter=512, out_filter=512, hkernel=3, wkernel=3, hpad=1, wpad=1, hstride=1, wstride=1) VTA CONV2D TEST PASSED: Time cost = 0.00716149 sec/op, 32.2853 GOPS

Run2:

{ “TARGET” : “pynq”, “HW_VER” : “0.0.1”, “LOG_INP_WIDTH” : 3, “LOG_WGT_WIDTH” : 3, “LOG_ACC_WIDTH” : 5, “LOG_BATCH” : 0, “LOG_BLOCK” : 4, “LOG_UOP_BUFF_SIZE” : 15, “LOG_INP_BUFF_SIZE” : 15, “LOG_WGT_BUFF_SIZE” : 18, “LOG_ACC_BUFF_SIZE” : 17 }

Output:

Conv2DWorkload(batch=1, height=56, width=56, in_filter=64, out_filter=64, hkernel=3, wkernel=3, hpad=1, wpad=1, hstride=1, wstride=1) CPU CONV2D TEST PASSED: Time cost = 0.143524 sec/op, 1.61095 GOPS Conv2DWorkload(batch=1, height=56, width=56, in_filter=64, out_filter=128, hkernel=3, wkernel=3, hpad=1, wpad=1, hstride=2, wstride=2) CPU CONV2D TEST PASSED: Time cost = 0.0737952 sec/op, 1.56657 GOPS Conv2DWorkload(batch=1, height=56, width=56, in_filter=64, out_filter=128, hkernel=1, wkernel=1, hpad=0, wpad=0, hstride=2, wstride=2) CPU CONV2D TEST PASSED: Time cost = 0.00987328 sec/op, 1.30099 GOPS Conv2DWorkload(batch=1, height=28, width=28, in_filter=128, out_filter=128, hkernel=3, wkernel=3, hpad=1, wpad=1, hstride=1, wstride=1) CPU CONV2D TEST PASSED: Time cost = 0.157858 sec/op, 1.46468 GOPS Conv2DWorkload(batch=1, height=28, width=28, in_filter=128, out_filter=256, hkernel=3, wkernel=3, hpad=1, wpad=1, hstride=2, wstride=2) CPU CONV2D TEST PASSED: Time cost = 0.07499 sec/op, 1.54161 GOPS Conv2DWorkload(batch=1, height=28, width=28, in_filter=128, out_filter=256, hkernel=1, wkernel=1, hpad=0, wpad=0, hstride=2, wstride=2) CPU CONV2D TEST PASSED: Time cost = 0.00941673 sec/op, 1.36407 GOPS Conv2DWorkload(batch=1, height=14, width=14, in_filter=256, out_filter=256, hkernel=3, wkernel=3, hpad=1, wpad=1, hstride=1, wstride=1) CPU CONV2D TEST PASSED: Time cost = 0.145916 sec/op, 1.58454 GOPS Conv2DWorkload(batch=1, height=14, width=14, in_filter=256, out_filter=512, hkernel=3, wkernel=3, hpad=1, wpad=1, hstride=2, wstride=2) CPU CONV2D TEST PASSED: Time cost = 0.0976691 sec/op, 1.18365 GOPS Conv2DWorkload(batch=1, height=14, width=14, in_filter=256, out_filter=512, hkernel=1, wkernel=1, hpad=0, wpad=0, hstride=2, wstride=2) CPU CONV2D TEST PASSED: Time cost = 0.0100878 sec/op, 1.27332 GOPS Conv2DWorkload(batch=1, height=7, width=7, in_filter=512, out_filter=512, hkernel=3, wkernel=3, hpad=1, wpad=1, hstride=1, wstride=1) CPU CONV2D TEST PASSED: Time cost = 0.188419 sec/op, 1.22711 GOPS is bitstream/home/cpre482/482/tvm/3rdparty/vta-hw/build/hardware/xilinx/vivado/pynq_1x16_i8w8a32_15_15_18_17/export/vta.bit “-target” is deprecated, use “-mtriple” instead. “-target” is deprecated, use “-mtriple” instead. “-target” is deprecated, use “-mtriple” instead. “-target” is deprecated, use “-mtriple” instead. “-target” is deprecated, use “-mtriple” instead. “-target” is deprecated, use “-mtriple” instead. Conv2DWorkload(batch=1, height=56, width=56, in_filter=64, out_filter=64, hkernel=3, wkernel=3, hpad=1, wpad=1, hstride=1, wstride=1) VTA CONV2D TEST PASSED: Time cost = 0.00656723 sec/op, 35.2068 GOPS Conv2DWorkload(batch=1, height=56, width=56, in_filter=64, out_filter=128, hkernel=3, wkernel=3, hpad=1, wpad=1, hstride=2, wstride=2) VTA CONV2D TEST PASSED: Time cost = 0.00312193 sec/op, 37.0301 GOPS Conv2DWorkload(batch=1, height=56, width=56, in_filter=64, out_filter=128, hkernel=1, wkernel=1, hpad=0, wpad=0, hstride=2, wstride=2) VTA CONV2D TEST PASSED: Time cost = 0.00150426 sec/op, 8.53913 GOPS Conv2DWorkload(batch=1, height=28, width=28, in_filter=128, out_filter=128, hkernel=3, wkernel=3, hpad=1, wpad=1, hstride=1, wstride=1) VTA CONV2D TEST PASSED: Time cost = 0.00532574 sec/op, 43.4139 GOPS Conv2DWorkload(batch=1, height=28, width=28, in_filter=128, out_filter=256, hkernel=3, wkernel=3, hpad=1, wpad=1, hstride=2, wstride=2) VTA CONV2D TEST PASSED: Time cost = 0.00276042 sec/op, 41.8796 GOPS Conv2DWorkload(batch=1, height=28, width=28, in_filter=128, out_filter=256, hkernel=1, wkernel=1, hpad=0, wpad=0, hstride=2, wstride=2) VTA CONV2D TEST PASSED: Time cost = 0.00110474 sec/op, 11.6273 GOPS Conv2DWorkload(batch=1, height=14, width=14, in_filter=256, out_filter=256, hkernel=3, wkernel=3, hpad=1, wpad=1, hstride=1, wstride=1) VTA CONV2D TEST PASSED: Time cost = 0.00501897 sec/op, 46.0674 GOPS Conv2DWorkload(batch=1, height=14, width=14, in_filter=256, out_filter=512, hkernel=3, wkernel=3, hpad=1, wpad=1, hstride=2, wstride=2) VTA CONV2D TEST PASSED: Time cost = 0.00394676 sec/op, 29.2912 GOPS Conv2DWorkload(batch=1, height=14, width=14, in_filter=256, out_filter=512, hkernel=1, wkernel=1, hpad=0, wpad=0, hstride=2, wstride=2) VTA CONV2D TEST PASSED: Time cost = 0.00111636 sec/op, 11.5062 GOPS Conv2DWorkload(batch=1, height=7, width=7, in_filter=512, out_filter=512, hkernel=3, wkernel=3, hpad=1, wpad=1, hstride=1, wstride=1) VTA CONV2D TEST PASSED: Time cost = 0.00715787 sec/op, 32.3016 GOPS

Run3:

{ “TARGET” : “pynq”, “HW_VER” : “0.0.1”, “LOG_INP_WIDTH” : 3, “LOG_WGT_WIDTH” : 3, “LOG_ACC_WIDTH” : 5, “LOG_BATCH” : 0, “LOG_BLOCK” : 4, “LOG_UOP_BUFF_SIZE” : 15, “LOG_INP_BUFF_SIZE” : 14, “LOG_WGT_BUFF_SIZE” : 17, “LOG_ACC_BUFF_SIZE” : 16 }

Output:

File “/home/cpre482/482/tvm/src/tir/transforms/storage_rewrite.cc”, line 565 TVMError:

An internal invariant was violated during the execution of TVM. Please read TVM’s error reporting guidelines. More details can be found here: https://discuss.tvm.ai/t/error-reporting/7793.

Check failed: total_elem * e->elem_type.bits() <= info->max_num_bits (919552 vs. 524288) : Allocation exceed bound of memory tag local.acc_buffer

Run4:

{ “TARGET” : “pynq”, “HW_VER” : “0.0.1”, “LOG_INP_WIDTH” : 3, “LOG_WGT_WIDTH” : 3, “LOG_ACC_WIDTH” : 5, “LOG_BATCH” : 0, “LOG_BLOCK” : 4, “LOG_UOP_BUFF_SIZE” : 15, “LOG_INP_BUFF_SIZE” : 13, “LOG_WGT_BUFF_SIZE” : 16, “LOG_ACC_BUFF_SIZE” : 15 }

Output:

File “/home/cpre482/482/tvm/src/tir/transforms/storage_rewrite.cc”, line 565 TVMError:

An internal invariant was violated during the execution of TVM. Please read TVM’s error reporting guidelines. More details can be found here: https://discuss.tvm.ai/t/error-reporting/7793.

Check failed: total_elem * e->elem_type.bits() <= info->max_num_bits (919552 vs. 524288) : Allocation exceed bound of memory tag local.acc_buffer

Run5:

{ “TARGET” : “pynq”, “HW_VER” : “0.0.1”, “LOG_INP_WIDTH” : 3, “LOG_WGT_WIDTH” : 3, “LOG_ACC_WIDTH” : 5, “LOG_BATCH” : 0, “LOG_BLOCK” : 4, “LOG_UOP_BUFF_SIZE” : 15, “LOG_INP_BUFF_SIZE” : 12, “LOG_WGT_BUFF_SIZE” : 15, “LOG_ACC_BUFF_SIZE” : 14 }

Output: TVMError:

An internal invariant was violated during the execution of TVM. Please read TVM’s error reporting guidelines. More details can be found here: https://discuss.tvm.ai/t/error-reporting/7793.

Check failed: const_size * dtype.bits() <= info->max_num_bits (458752 vs. 131072) : Allocation exceed bound of memory tag local.acc_buffer

Run6:

{ “TARGET” : “pynq”, “HW_VER” : “0.0.1”, “LOG_INP_WIDTH” : 3, “LOG_WGT_WIDTH” : 3, “LOG_ACC_WIDTH” : 5, “LOG_BATCH” : 0, “LOG_BLOCK” : 4, “LOG_UOP_BUFF_SIZE” : 14, “LOG_INP_BUFF_SIZE” : 15, “LOG_WGT_BUFF_SIZE” : 17, “LOG_ACC_BUFF_SIZE” : 16 }

Output:

TVMError:

An internal invariant was violated during the execution of TVM. Please read TVM’s error reporting guidelines. More details can be found here: https://discuss.tvm.ai/t/error-reporting/7793.

Check failed: total_elem * e->elem_type.bits() <= info->max_num_bits (919552 vs. 524288) : Allocation exceed bound of memory tag local.acc_buffer

Run7: { “TARGET” : “pynq”, “HW_VER” : “0.0.1”, “LOG_INP_WIDTH” : 3, “LOG_WGT_WIDTH” : 3, “LOG_ACC_WIDTH” : 4, “LOG_BATCH” : 0, “LOG_BLOCK” : 4, “LOG_UOP_BUFF_SIZE” : 15, “LOG_INP_BUFF_SIZE” : 12, “LOG_WGT_BUFF_SIZE” : 15, “LOG_ACC_BUFF_SIZE” : 14 }

output:

VTA:

d::_Any_data const&, tvm::runtime::TVMArgs&&, tvm::runtime::TVMRetValue*&&)+0x14) [0x7f06af81a884] [bt] (2) /home/cpre482/482/tvm/build/libtvm.so(tvm::NodeGetAttr(tvm::runtime::TVMArgs, tvm::runtime::TVMRetValue*)+0x31e) [0x7f06af8187ce] [bt] (1) /home/cpre482/482/tvm/build/libtvm.so(tvm::ReflectionVTable::GetAttr(tvm::runtime::Object*, tvm::runtime::String const&) const+0x2a2) [0x7f06af817ea2] [bt] (0) /home/cpre482/482/tvm/build/libtvm.so(+0x5b5cf2) [0x7f06af816cf2] File “/home/cpre482/482/tvm/src/node/reflection.cc”, line 110 File “/home/cpre482/482/tvm/python/tvm/_ffi/_ctypes/packed_func.py”, line 81, in cfun rv = local_pyfunc(*pyargs) File “/home/cpre482/482/tvm/vta/python/vta/transform.py”, line 972, in _do_fold assert lhs.buffer_var.same_as(dst_var) File “/home/cpre482/482/tvm/python/tvm/runtime/object.py”, line 60, in getattr raise AttributeError("%s has no attribute %s" % (str(type(self)), name)) AttributeError: tir.Cast object has no attributed buffer_var During handling of the above exception, another exception occurred:

AttributeError: <class ‘tvm.tir.expr.Cast’> has no attribute buffer_var