I was building the VTA for the Intel OpeCL FPGA flow, using de10 nano as the targeted board. After waiting for a while (around 1 day) the build process throw the following error
mkdir -p /media/alex/Elements/libraries/tvm/3rdparty/vta-hw/hardware/intelfocl/…/…/build/hardware/intelfocl/intelfocl_1x16_i8w8a32_15_15_18_17
cd /media/alex/Elements/libraries/tvm/3rdparty/vta-hw/hardware/intelfocl/…/…/build/hardware/intelfocl/intelfocl_1x16_i8w8a32_15_15_18_17 &&
aoc -v
/media/alex/Elements/libraries/tvm/3rdparty/vta-hw/hardware/intelfocl/src/vta.cl
-I/media/alex/Elements/libraries/tvm/include -I/media/alex/Elements/libraries/tvm/3rdparty/vta-hw/include -I/media/alex/Elements/libraries/tvm/3rdparty/dlpack/include -I/media/alex/Elements/libraries/tvm/3rdparty/dmlc-core/include -DVTA_TARGET=intelfocl -DVTA_LOG_BLOCK_IN=4 -DVTA_LOG_ACC_BUFF_SIZE=17 -DVTA_LOG_BATCH=0 -DVTA_LOG_OUT_WIDTH=3 -DVTA_LOG_INP_BUFF_SIZE=15 -DVTA_LOG_ACC_WIDTH=5 -DVTA_LOG_BLOCK=4 -DVTA_LOG_OUT_BUFF_SIZE=15 -DVTA_LOG_INP_WIDTH=3 -DVTA_LOG_BLOCK_OUT=4 -DVTA_LOG_UOP_BUFF_SIZE=15 -DVTA_HW_VER=0.0.2 -DVTA_LOG_WGT_BUFF_SIZE=18 -DVTA_LOG_WGT_WIDTH=3 -DVTA_LOG_BUS_WIDTH=6 -DVTA_IP_REG_MAP_RANGE=0x1000 -DVTA_FETCH_ADDR=0x43C00000 -DVTA_LOAD_ADDR=0x43C01000 -DVTA_COMPUTE_ADDR=0x43C02000 -DVTA_STORE_ADDR=0x43C03000 -DVTA_FETCH_INSN_COUNT_OFFSET=16 -DVTA_FETCH_INSN_ADDR_OFFSET=24 -DVTA_LOAD_INP_ADDR_OFFSET=16 -DVTA_LOAD_WGT_ADDR_OFFSET=24 -DVTA_COMPUTE_DONE_WR_OFFSET=16 -DVTA_COMPUTE_DONE_RD_OFFSET=24 -DVTA_COMPUTE_UOP_ADDR_OFFSET=32 -DVTA_COMPUTE_BIAS_ADDR_OFFSET=40 -DVTA_STORE_OUT_ADDR_OFFSET=16 -DVTA_COHERENT_ACCESSES=true
-o /media/alex/Elements/libraries/tvm/3rdparty/vta-hw/hardware/intelfocl/…/…/build/hardware/intelfocl/intelfocl_1x16_i8w8a32_15_15_18_17/vta_opencl.aocx
aoc: Environment checks are completed successfully.
aoc: Cached files in /var/tmp/aocl/alex may be used to reduce compilation time
You are now compiling the full flow!!
aoc: Selected default target board de10_nano_sharedonly
aoc: Running OpenCL parser…
aoc: OpenCL parser completed successfully.
aoc: Optimizing and doing static analysis of code…
aoc: Linking with IP library …
Checking if memory usage is larger than 100%
aoc: First stage compilation completed successfully.
Compiling for FPGA. This process may take a long time, please be patient.
Error (170011): Design contains 181671 blocks of type combinational node. However, the device contains only 83820 blocks.
Error (170048): Selected device has 985 RAM location(s) of type LAB. However, the current design needs more than 985 to successfully fit
Error: Cannot fit kernel(s) on device
make: *** [Makefile:49: /media/alex/Elements/libraries/tvm/3rdparty/vta-hw/hardware/intelfocl/…/…/build/hardware/intelfocl/intelfocl_1x16_i8w8a32_15_15_18_17/vta_opencl.aocx] Error 1
How is it possible that for the same VTA configuration the required logic resources are higher than the original VTA bitstream for de10nano? Is it because Intel OpenCL for FPGA toolchain doesn’t optimize well the IP?