[pre-RFC] TVM Explorer Infrastructure

areusch · September 21, 2022, 10:22pm

Cool, thanks for the explanations!

The Var thing I’m discussing here is not exactly a simple tweak to this proposal–it’s probably significant enough lift that it would deserve its own RFC. So just to clarify–I’m not necessarily asking you to change your approach. However, I did want to raise this question to a) build more support for the idea, b) see if it is potentially easier to pursue than adding SIBuilder support to the remaining passes, and c) think through whether it’d be easier to maintain in the long run.

The basic idea is like so: consider your one-to-many example conversion. A common challenge we face in TVM is determining which Relay Expr correspond to one another before and after a pass. To choose a concrete example, suppose we introduce a pass which outlines part of a function (suppose it outlines Pack from your previous example). Before executing the pass, suppose we start from your example:

chunit:

def @main (%input: Tensor[(?, ?, 3, 1), float32]) {
    %0 = shape_of(%input, dtype="int32") /* Shape */;
    %1 = strided_slice(%0, …) /* strided_slice */;
    %2 = squeeze(%1) /* strided_slice */;
    # the Pack Op conversion start from here
    %3 = expand_dims(%2, axis=0) /* stack */;
    %4 = expand_dims(3, axis=0) /* stack */;
    %5 = expand_dims(3, axis=0) /* stack */;
    %6 = (%3, %4, %5) /* stack */;
    %7 = concatenate(%6) /* stack */;
}

Now suppose we run the outliner, and arrive at:

def @outlined_pack(%i1) {
  %0 = expand_dims(%i1, axis=0) /* stack */;
  %1 = expand_dims(3, axis=0) /* stack */;
  %2 = expand_dims(3, axis=0) /* stack */;
  %3 = (%0, %1, %2) /* stack */;
  %4 = concatenate(%3) /* stack */;
  %4
}

def @main (%input: Tensor[(?, ?, 3, 1), float32]) {
    %0 = shape_of(%input, dtype="int32") /* Shape */;
    %1 = strided_slice(%0, …) /* strided_slice */;
    %2 = squeeze(%1) /* strided_slice */;
    # the Pack Op conversion start from here
    %3 = @outlined_pack(%2);
    %3
}

Now the question here is: after running the pass, does a new Relay var exist which contains %7? The answer is yes: it’s %7. In order to make this outline, an e.g. ExprMutator needed to capture the subgraph that contains %3 through %7, then replace it with a call to the new function and store the result in %3. This pass knows that %3 == %7, and (similarly to how Span information is filled here) when defining %3, could include some type of backreference to %7. This could even just be included as a Map:

using VarMap = Map<Var,Var>;  // keys are originally-imported Var, values are the equivalent now inside f.
Function f = mod.GetFunction("main");
f->GetAttr<VarMap>("var_map");

This approach could be taken all the way back to the original import (e.g. or there could be an additional map from input framework layer to Relay var).

SIBuilder takes as input a set of Expr which bound the subgraph. Since most Relay programs are transformed in A-Normal form, the VarMap could substitute for these Expr. This won’t work for all optimizations, but I think for a decently large class of them, we could automatically apply SIBuilder by walking VarMap and applying Spans to the subgraphs with endpoints in VarMap. The advantage of this technique is that it could also be done with TIR with the same approach.

I think you’d need to assert that the Relay or TIR graph could be partitioned along VarMap for this to work–so I’m not saying it would work for all transforms. But I do think it would work for many. It’s also worth noting that this is a best-effort tracking scheme–it’s possible through e.g. operator fusion that some Vars could simply be eliminated. In these cases, the VarMap may not contain all Var from the original model

chunit:

RunTime performance

function Without span filling With span filling with span filling & schedule_record

relay.frontend.from_tflite() 133174.0 us 176468.0 us(↑32.51%) 177774.0 us(↑33.49%)

relay.build() 7480367.0 us 7558526.0 us(↑1.045%) 7580165.0 us(↑1.334%)

Memory usage

function Without span filling With span filling with span filling & schedule_record

relay.frontend.from_tflite() 26.105 MiB 26.203 MiB(↑0.375%) 26.211 MiB(↑0.406%)

relay.build() 147.762 MiB 148.148 MiB(↑0.261%) 148.418 MiB(↑0.443%)

We also provide options to disable span filling and shcedule recording if users don’t need them.

Thanks for providing this data! It seems reasonable as part of running with a debug option at least!

function	Without span filling	With span filling	with span filling & schedule_record
relay.frontend.from_tflite()	133174.0 us	176468.0 us(↑32.51%)	177774.0 us(↑33.49%)
relay.build()	7480367.0 us	7558526.0 us(↑1.045%)	7580165.0 us(↑1.334%)

function	Without span filling	With span filling	with span filling & schedule_record
relay.frontend.from_tflite()	26.105 MiB	26.203 MiB(↑0.375%)	26.211 MiB(↑0.406%)
relay.build()	147.762 MiB	148.148 MiB(↑0.261%)	148.418 MiB(↑0.443%)