Does anyone have tips or sequence they use to troubleshoot performance slowdowns that come during tuning?
This weekend I was playing around with the TVMC driver. This makes it pretty convenient to take an existing ONNX file and see what happens when you tune with either autotvm or the new autoscheduler and then compare performance with untuned models.
Command likes I was using were typically of the form with fairly basic options.
tvmc tune --target rocm --output tunedbfile.json --enable-autoscheduler onnxfile tvmc compile --target rocm --tuning-records tunedbfile.json --output compilefile.tar onnxfile tvmx run --device rocm --fill-mode random --print-time --repeat 100 compilefile.tar
Sometimes the tuning significantly but there were also sometimes when it seemed tuning fell off a cliff to make a much slower program.
What are the best ways to troubleshoot and figure out what is going on when the performance significantly degrades? Any tips or tricks? I’ve found that I can use the --profile option on the tvmc run command to at least see the kernels in question that become much slower. However, I’m not quite sure on where/how to look next.
As a practical example, inception model exported from torchvision library works pretty well and resnet50 exported from torchvision is an example that slows down with a few convolution kernels much slower.
Not quite sure where/how to dig deeper on what might be going on.