Define hardware specific Matmul::Config

The MatMul schedule implementation in dlight/gpu/matmul.py uses a Matmul::Config to define the parameters for the schedule. Currently this config is defined to be target specific (cuda, rocm, opencl…). However, it makes more sense to adjust this config according to specific hardware. E.g. different Vulkan devices may prefer different config parameters for optimal performance.
Here are my questions for discussion: (1) how can we make Matmul::Config hardware specific? Using a config file? (2) Who can make the change? I’m willing to help too.

I noticed this issue when measuring llama2 model prefill time on AMD iGPU using Vulkan target. The default Matmul::Config values doesn’t lead to optimal prefill time on AMD iGPU, maybe because they were optimized for a different GPU hardware. Other tvm users might meet similar issue when running on a different hardware than the one being tested by the contributors.

Any suggestions on how to make Matmul::Config hardware specific? Thanks.