[pre-RFC] TVM Explorer Infrastructure

chunit · September 20, 2022, 3:16am

No worry. Thank you very much for helping us! If you don’t mind I would like to submit more materials for you, and ask some qusetions about the Var thing you just mentioned.

Right now, this needs to be done per-pass

Yes, we did attach span to per-pass based on the “sequentialSpan” and “SIBuilder”. It is a time consuming task. Currently we have done the following passes. All these passes are invoked during the build flow. We would try to complete the rest of passes.

RelayPass	TIRPass	Not yet done TIRPass
AlterOpLayout	LowerInitBlock	BF16Legalize
AutoSchedulerLayoutRewrite	LowerIntrin	CombineContextCall
CanonicalizeCast	MakePackedAPI	CompactBufferAllocation
CanonicalizeOps	MakeUnpackedAPI	ConvertBlocksToOpaque
CombineParallelBatchMatmul	NarrowDataType	FlattenBuffer
CombineParallelConv2D	PlanAndUpdateBufferAllocationLocation	HoistIfThenElse
CombineParallelDense	RemoveNoOp	InferFragment
DefuseOps	RewriteUnsafeSelect	InjectDoubleBuffer
DynamicToStatic	SplitHostDevice	InjectPrefetch
EliminateCommonSubexpr		InjectVirtualThread
EtaExpand		InstrumentBoundCheckers
FastMath		LoopPartition
FoldConstant		LowerCustomDatatypes
FoldScaleAxis		LowerDeviceStorageAccessInfo
FuseOps		LowerMatchBuffer
InferType		LowerTVMBuiltin
Inline		LowerThreadAllreduce
SplitArgs		LowerWarpMemory
LabelOps		MergeDynamicSharedMemoryAllocations
Legalize		Simplify
RemoveUnusedFunctions		StorageFlatten
SimplifyExpr		StorageRewrite
SimplifyInference		TextureFlatten
ToBasicBlockNormalForm		ThreadSync
relay::qnn::transform::Legalize		UnifyThreadBinding
		UnrollLoop
		VectorizeLoop
		VerifyMemory

I wonder if we could get away with doing this once at the end of compilation if we also attach references to the frontend layer (or post-import variable) to each Relay Var.

If it could be done at the end of compilation it would be quite convenient! Sorry that I am not really following this. May I have your explanation again please? Like, may I have an example for

What it looks like about attaching references to the frontend layer?
What should be attached to Relay Var?

It seems like by annotating Var we might be able to add this information.

About this part I would like to have some more explanation. Except the Var or Params, this problem also happens in those one-to-many conversion. Here I would like to take the Pack OP from TF as example again. Currently we fill the layer name to the converted IR like this:

def @main (%input: Tensor[(?, ?, 3, 1), float32]) {
    %0 = shape_of(%input, dtype="int32") /* Shape */;
    %1 = strided_slice(%0, …) /* strided_slice */;
    %2 = squeeze(%1) /* strided_slice */;
    # the Pack Op conversion start from here
    %3 = expand_dims(%2, axis=0) /* stack */;
    %4 = expand_dims(3, axis=0) /* stack */;
    %5 = expand_dims(3, axis=0) /* stack */;
    %6 = (%3, %4, %5) /* stack */;
    %7 = concatenate(%6) /* stack */;
}

And here is the result from former patch:

def @main (%input: Tensor[(?, ?, 3, 1), float32]) {
    %0 = shape_of(%input, dtype="int32") /* Shape /;
    %1 = strided_slice(%0, begin=[0], end=[1], strides=[1], axes=None) / strided_slice_PART_0 /;
    %2 = squeeze(%1) / strided_slice /;
    %3 = expand_dims(%2, axis=0) / stack_PART_0 /;
    %4 = expand_dims(3, axis=0) / stack_PART_1 /;
    %5 = expand_dims(3, axis=0) / stack_PART_2 /;
    %6 = (%3, %4, %5) / stack_PART_3 /;
    %7 = concatenate(%6) / stack /;
}

In the former patch we can indicate computation output of Pack Op immediately because we do not add suffix for it. Now we remove it because we notice that “_part_” suffix is really annoying and misleading after the pass transformations.

The drawback of current version is we cannot tell which one is the computation output because they all look the same. Perhaps we can do something like the following example. But we are still seeking for a better solution.

def @main (%input: Tensor[(?, ?, 3, 1), float32]) {
    %0 = shape_of(%input, dtype="int32") /* Shape */;
    %1 = strided_slice(%0, …) /* strided_slice */;
    %2 = squeeze(%1) /* strided_slice */;
    # the Pack Op conversion start from here
    %3 = expand_dims(%2, axis=0) /* stack */;
    %4 = expand_dims(3, axis=0) /* stack */;
    %5 = expand_dims(3, axis=0) /* stack */;
    %6 = (%3, %4, %5) /* stack */;
    %7 = concatenate(%6) /* stack_OUTPUT */;
}

it’s harder to fill span information back up through the compiler since the layer variables have changed. I’m curious if you guys tried to apply this to any TIR-based fusion?

We are still working on the TIR pass as shown in the list above. Besides, we haven’t done the propagation between Relay → TE or TIR. Because that’s also a tough part we encounter. Things are not too complicated in the Relay environment, but it becomes harder when we go down to lower IR like TE and TIR. Currently we still rely on the layer name. Yet we are thinking perhaps using the row & column number could be more robust and more indicative.

If we have a precise definition of the line number information of an IRModule, we could at least have a better mapping relationship before and after “a pass”.

Lastly, any idea how much additional memory this takes or performance impact?

Yes, take the mobilenet_v1_2018_08_02 for example, here is the profiling result:

RunTime performance

function	Without span filling	With span filling	with span filling & schedule_record
relay.frontend.from_tflite()	133174.0 us	176468.0 us(↑32.51%)	177774.0 us(↑33.49%)
relay.build()	7480367.0 us	7558526.0 us(↑1.045%)	7580165.0 us(↑1.334%)

Memory usage

function	Without span filling	With span filling	with span filling & schedule_record
relay.frontend.from_tflite()	26.105 MiB	26.203 MiB(↑0.375%)	26.211 MiB(↑0.406%)
relay.build()	147.762 MiB	148.148 MiB(↑0.261%)	148.418 MiB(↑0.443%)

We also provide optionst to disable span filling and shcedule recording if users don’t need them.