Generating c code for custom neural network accelerator

kslavka · July 3, 2022, 7:09pm

Hi all, I am developing a custom NN accelerator hardware. I also provide a C API layer that implements basic operators like conv2d, dense, depthwise conv2d, and others. This implementation utilizes the NN accelerator features in the most optimized way.

I would like to use TVM to parse any frontend (tflite, onnx, keras, …) and eventually generate a C code that calls my customer C API for the operators that I’ve implemented and use the default implementation for the ones not implemented yet.

I would appreciate some guidance or explanation of what would be the correct way to do the above.

Lyken17 · July 7, 2022, 7:29pm

You may refer to this code snippet

import numpy as np

import json

import torch
import torch.nn as nn
import torch.nn.functional as F

import tvm
from tvm import relay, te, TVMError, auto_scheduler
from tvm import topi
from tvm.contrib import graph_executor


net = nn.Sequential(
    nn.Linear(10, 20),
    nn.ReLU(),
    nn.Linear(20, 10),
    nn.ReLU(),
)

data = torch.randn(1, 10)
ts = torch.jit.script(net, data)
shape_list = [("input0", data.shape)]
scripted_model = torch.jit.trace(net, data).eval()
fmod, params = relay.frontend.from_pytorch(scripted_model, shape_list)

global extern_prim_fn
extern_prim_fn = None
@tvm.tir.transform.prim_func_pass(opt_level=0)
def print_tir(f, mod, ctx):
    global extern_prim_fn
    print(f)
    extern_prim_fn = f

try:
    with tvm.transform.PassContext(
        opt_level=3, config={"tir.add_lower_pass": [(3, print_tir)]}
    ):
        lib = relay.build(fmod, target="llvm")
except TVMError:
    if extern_prim_fn is None:
        raise
        
        
rt_mod = tvm.build(extern_prim_fn, target="c")
print(rt_mod.get_source())

fPecc · July 8, 2022, 7:52am

I would advise you to take a look at how the VTA accelerator is integrated into TVM, and also the Ethos-U. I also think this RFC would make your life soooo much easier

kslavka · July 12, 2022, 5:59am

Thanks a lot, I think UMA (Universal Modular Accelerator) is what I’m looking for. It is still in development though. I will post my findings later in this thread

cgerum · July 14, 2022, 1:19pm

@kslavka The pull request for UMA baseline infrastructure is currently being integrated in the TVM toolchain: [UMA] UMA v1.0 by MichaelJKlaiber · Pull Request #12087 · apache/tvm · GitHub but will surely need a view rounds of review before it finally lands in main. But I think it needs a few more additions to actually improve your user experience significantly over current BYOC.

For the further adoption of UMA it would be great if you could provide an example for the C-Code you would like to be generated by TVM.

.