Hi all,
How does the tvm flush cache between any fused groups execution?
I ran a model with two variants where the tensor passed between the two fused groups is int8 and int16. But while profiling both have the same execution time. I think there must be some speedup during tensor load/store from int16 to int8.
thanks.