[auto-scheduler] Scheduling custom sparse conv2d implementation

I have been exploring sparse computations in TVM, and have been making sparse implementations of conv2d (the current one in the main repo does not cover enough cases for my needs).

I was pleased to see that newer versions of TVM now support Ansor (auto-scheduling) for sparse computations.

Thus, I rewrote my algorithm to support this. There is a tutorial on “sparse sketch rules”, which can apparently help the auto-scheduler find good schedules.

However, when I attempt to run the auto-scheduler I do not generate any valid schedules. I have tried playing around with the sketch rules, however my understanding is that this only gives the auto-scheduler a “better start”.

@jcf94 you wrote the tutorial on sparse sketch rules, and GitHub user @yuchaoli also has some key commits on this topic (though I can’t find their TVM forums handle). Anything I’m missing, or resources you could point me too?

I’ve made a fork here with my code. Commit bab9e0ba77 is where I add my algorithm, defined under sparse_conv2d_direct.py.

Commit c20c64f2e is where I explore sketch rules.

This simple gist is a single layer sparse CNN that I’ve been using to test things.

Thoughts on what I may be missing here?

Thanks! @Wheest Sorry for it’s actually really long time since my last visit on the sparse topic.

Does the code on [a tutorial on “sparse sketch rules”] still work now? If not, I think I should be responsable to fix it.

The tutorial [Auto-scheduling Sparse Matrix Multiplication on CPU with Custom Sketch Rule — tvm 0.10.dev0 documentation] you get tunes a single Sparse Matrix Multiplication op. For working on a complete network, actually there is an option use_sparse in tutorial [Auto-scheduling a Neural Network for x86 CPU — tvm 0.10.dev0 documentation], you can try that first and if it doesn’t work now, I should also be responsable to fix it.

It was still an experimental feature at the time I wrote those toturials, so you see in my sparse network tutorial we even used random sparse weights.

Many thanks. Yes I can confirm that the tutorial codes still work, as does tuning a sparse CNN using the current default sparse CNN implementation.

However that sparse conv2d implementation only covers limited situations (e.g., only 3x3/1x1, no dilation/padding).

Hence I am trying to develop a few more general purpose sparse implementations. I actually started this back in 2020, but had to put it on hold due to other commitments.

My untuned code works fine, but running auto-scheduling for my sparse code does not seem to work. So I was wondering if there is something I am missing? I can generate a workload DAG for Ansor, but every generated schedule is invalid.

I’ve made some progress on this issue, namely discovering that the auto-scheduler was not processing the names of my sparse arrays properly.

When running a standalone conv2d, in a similar way to the standalone sparse-dense layer in the tutorial, you need to pass the array names to the SearchTask with the task_inputs argument. However this was silently failing for me. Tracing to the prepare_runner_args function in auto_scheduler/measure.py, I was able to see that the arrays were not being handled properly.

To fix this, I had to edit the try_get_conv2d_sparse_input function in topi/nn/sparse.py to match the features of my own general purpose sparse conv2d implementation.

My issue right now is trying to understand the rules/reasoning behind Sketch Rules. We have a couple of examples in topi/sparse/utils.py, as well as the sparse auto-scheduling tutorial. However, it is still unclear to me from the code, comments, and documentation:

  • what exactly a sparse sketch rule is suppose to do?
  • what the primitives are?
  • what conditions need to be met for a sketch rule to be valid?
  • what are the advantages of a “more optimized” sketch rule, and how would that compare to a MWE (minimum working example) sketch rule?

This is a barrier for me producing my own auto-scheduled sparse computations, as I am still trying to reverse engineer existing sketch rules to understand their behaviour. So far, my own attempts at sketch rules cause the auto-scheduler to fail with No valid state found in this search round. Check if it has traversed all of the search space.

I believe I have got a working sketch rule for my case. Essentially the approach I have taken is to split and “reorder” (without actually reordering) my main loops, and then the auto-scheduler does really well (>80x speedup).

My above questions still stand though. My intuition is that by default, iterators involving sparse arrays are not the right object type for the autoscheduler to understand. The sketch rule transforms them into a type that the autoscheduler can understand.