[VTA] Instruction set architecture extensions

We would like to enhance the configurability of VTA to support new workloads and additional points on the performance/cost pareto curve(s). VTA can best demonstrate the flexibility of the tvm framework by becoming a more capable accelerator.

Today, VTA has many targets: sim, tsim, pynq, de10nano, intelfocl, ultra96, and zcu104. The last 5 targets may not accessible to all developers since they require hardware and/or licensed software. For perhaps the same reasons, these targets are also not part of the continuous integration system.

In order to enhance the configurability of VTA, today’s instruction set architecture is insufficient. Field widths may need to be made flexible and new instructions or opcodes introduced. The sim and tsim targets can be enhanced to support such modifications, and tests for those targets can also be enhanced accordingly. However, what should we do about the remaining targets?

Here are two options:

  1. Extract a subset of the build flow for each target which can be run by all users, then add it to the CI system. ISA modifications are permitted as long as all CI tests pass. Since some targets will only have partial coverage on the build/deployment flow, some ISA changes might break certain targets, and that will be unknown until a developer/user with access to that target tries out the entire build process and reports a bug. Perhaps this is already happening today?

  2. Add ISA modifications in an incremental manner. For instance, a new ‘ENABLE_FEATURE_XYZ’ flag could be added to the json config file. Certain targets could reject these enhanced json files as unsupported, but all targets would still support the current set of json files. Compiler and runtime support would be added for all new features. Interaction between multiple features on certain targets might become complex to handle.

I’m interested to know the opinions of the community about this issue.

To provide more context, the concrete ISA extension which might benefit from this decision is permitting 64-bit uops. Today, 32-bit uops are not wide enough to support larger scratchpads. With the recent tsim changes in tvm-vta!27, tvm-vta!30, and tvm-vta!32, scratchpad size becomes a bottleneck for certain workloads.

@vegaluis @thierry