Cannot allocate memory symbolic tensor shape [?, 1] if tf.import_graph_def() used

after validation, I found this print() performance issue is related with async execution. similar topics here: https://discuss.tvm.apache.org/t/how-could-we-request-a-inference-synchronously/1135 https://discuss.tvm.apache.org/t/how-to-make-get-output-function-faster/5005/2

if print() used, time cost measured is async kernel launch time, with no result copy from device. if i add ‘context.sync()’,time cost will align to the case ‘no print()’.

the reason print() cause this problem may need dig in a little bit more.