I get relay IR printed by call build_module.lower() : // func_name is “fused_sqrt_2” produce T_sqrt { parallel (ax0, 0, 128) { T_sqrt[ax0] = sqrt(placeholder[ax0]) } }
I want to know what really happens with “parallel (ax0, 0, 128)” , will there be 128 tasks to be tun at runtime? If i set TVM_NUM_THREADS=8, how does TVM schedule these 128 tasks?