Do you have some paper or more information on what you have been working on? I would love to know more about it.
I don’t think it would be worth for a paper, but a small & clear tutorial might do it.
I am thinking to publish a small tutorial on this, within TVM, with the main goal to highlight the metascheduler’s autotensorization feature, how to use it to further tune kernels and nets in custom ways (i.e. it can showcase there a simple declarative older sse2/sse3 constructs as sample). Here the highlighs can be on:
- how to decalare TIR search template for autotensorizer
- how to declare the template’s call/implementation to tie it with a the fast ISA/intrinsics
- how to tune nnet operators (imported graph) with metaschedule’s autotensoriezer enabled
- how to inspect IR within this metaschedule tunning process (as a human readable form)
- how to check/select/filter the autotensorized variants (regardless of performance) of tuned net
The autotensorizer can be used to insert more complex one-shot HW supported things too, not only classical fast ISA/instructions.
As a consolation that VTA & micro is gone, the mentioned tutorial’s last part/goal can include a small showcase how to construct a small custom “vector instruction/block” (i.e. a instantaneous HW dot-product) as a hypotetic ISA extension (i.e. it can be a futuristic RISC-V extension/block) and how to declare the TIR search template for it with it’s real or a virtual (in our case, to run on a local PC for simulation, a C equivalent or a verilated call/implementation function for it).
If you think this is a good idea and don’t mind I Cc you to the Draft of the PR.
My apologize if I derailed a bit the subject but I tried a alternative for the missing VTA/micro stuff here.