Dear community, I’m using kl_divergence to quantize a quite big in-house network. I’ve implemented a mechanism to feed it pickle input frames which I generate from the reference implementation. Since the network inputs are quite large, the resulting (binary-encoded) pickle files grow to around 14MBs per frame… Currently I’m feeding around 157 frames (around 2.2GBs in total), where the quantizer fails with the following error:
tvm._ffi.base.TVMError: Traceback (most recent call last):
[bt] (5) /home/buecs/tvm/build/libtvm.so(TVMFuncCall+0x65) [0x7f969a57db25]
[bt] (4) /home/buecs/tvm/build/libtvm.so(+0x402c34) [0x7f9699d55c34]
[bt] (3) /home/buecs/tvm/build/libtvm.so(+0x402aa7) [0x7f9699d55aa7]
[bt] (2) /home/buecs/tvm/build/libtvm.so(tvm::transform::SequentialNode::operator()(tvm::IRModule const&, tvm::transform::PassContext const&) const+0x389) [0x7f9699d557d9]
[bt] (1) /home/buecs/tvm/build/libtvm.so(tvm::transform::ModulePassNode::operator()(tvm::IRModule const&, tvm::transform::PassContext const&) const+0x10f) [0x7f9699d549af]
[bt] (0) /home/buecs/tvm/build/libtvm.so(+0xc25f8b) [0x7f969a578f8b]
File "/home/buecs/tvm/python/tvm/_ffi/_ctypes/packed_func.py", line 78, in cfun
rv = local_pyfunc(*pyargs)
File "/home/buecs/tvm/python/tvm/relay/quantize/_calibrate.py", line 191, in wrapped_func
input_scale_func = _kl_scale(mod, dataset)
File "/home/buecs/tvm/python/tvm/relay/quantize/_calibrate.py", line 102, in _kl_scale
scales += list(pool.map(_find_scale_by_kl, samples))
File "/usr/lib/python3.6/multiprocessing/pool.py", line 266, in map
return self._map_async(func, iterable, mapstar, chunksize).get()
File "/usr/lib/python3.6/multiprocessing/pool.py", line 644, in get
raise self._value
File "/usr/lib/python3.6/multiprocessing/pool.py", line 424, in _handle_tasks
put(task)
File "/usr/lib/python3.6/multiprocessing/connection.py", line 206, in send
self._send_bytes(_ForkingPickler.dumps(obj))
File "/usr/lib/python3.6/multiprocessing/connection.py", line 393, in _send_bytes
header = struct.pack("!i", n)
struct.error: 'i' format requires -2147483648 <= number <= 2147483647
I was trying to play with the calibrate_chunk_by
parameter, but so far non of the tried value settings remove this error.
Has anyone encounter a similar error before? If yes, how could I solve / mitigate this? An expert opinion of for example @vinx13 would be much appreciated!
Thank you in advance & Best regards, Robert