The recently merged CUTLASS BYOC relies on C-codegen based BYOC infra to JIT generate and compile C++ template classes.
Currently it doesn’t support Constants embedded in an external function and instead requires all weight and bias parameters etc to be passed in at runtime. This caused a problem for me, when I apply CUTLASS BYOC to a real model: I need to run constant folding to turn fp32 bias parameters into fp16 for pattern matching purpose and sending fp16 tensors to CUTLASS. For that, I need to bind parameters to the module by bind_params_by_name
, which embeds constant to the external functions like this, which is not supported by CUTLASS BYOC right now:
def @tvmgen_default_cutlass_main_267(%cutlass_267_i0: Tensor[(1024, 1024), float16], %cutlass_267_i1: Tensor[(4096, 1024), float16], Inline=1, Compiler="cutlass", global_symbol="tvmgen_default_cutlass_main_267", Primitive=1) -> Tensor[(1024, 4096), float16] {
%9 = fn (%FunctionVar_8_0: Tensor[(1024, 1024), float16], %FunctionVar_8_1: Tensor[(4096, 1024), float16], %FunctionVar_8_2: Tensor[(4096), float16], PartitionedFromPattern="nn.dense_add_multiply_cast_erf_cast_multiply_add_multiply_", Composite="cutlass.dense_bias_gelu_fp16") -> Tensor[(1024, 4096), float16] {
%1 = nn.dense(%FunctionVar_8_0, %FunctionVar_8_1, units=None, out_dtype="float16") /* ty=Tensor[(1024, 4096), float16] */;
%2 = add(%1, %FunctionVar_8_2) /* ty=Tensor[(1024, 4096), float16] */;
%3 = multiply(%2, meta[relay.Constant][0] /* ty=float16 */) /* ty=Tensor[(1024, 4096), float16] */;
%4 = cast(%3, dtype="float32") /* ty=Tensor[(1024, 4096), float32] */;
%5 = erf(%4) /* ty=Tensor[(1024, 4096), float32] */;
%6 = cast(%5, dtype="float16") /* ty=Tensor[(1024, 4096), float16] */;
%7 = multiply(%6, meta[relay.Constant][1] /* ty=float16 */) /* ty=Tensor[(1024, 4096), float16] */;
%8 = add(%7, meta[relay.Constant][2] /* ty=float16 */) /* ty=Tensor[(1024, 4096), float16] */;
multiply(%8, %2) /* ty=Tensor[(1024, 4096), float16] */
};
// meta[relay.Constant][3] is the bias constant, not supported by CUTLASS BYOC for now
%9(%cutlass_267_i0, %cutlass_267_i1, meta[relay.Constant][3] /* ty=Tensor[(4096), float16] */) /* ty=Tensor[(1024, 4096), float16] */
}
So I now need to deal with Constants. I think embedding all constants into C-source is infeasible for models like BERT-large
which I’m working with. Alternative I think of is to somehow “unbind” constants after constant folding. But this requires modifying signatures of external functions and passing additional parameters inside main
module, for which I don’t see an easy way to achieve.
My questions:
- Is there a good way to deal with Constants in C-source codegen based BYOC? Has there been any improvement since discussions from last year such as [External Codegen] Constant tensors in c-codegen and https://github.com/apache/tvm/pull/5310 (also cc @lhutton1 @manupa-arm @mbaret)
- Should CUTLASS codegen switch to JSON runtime, which I believe has no issues with constants? How can we compile generated C-source with JSON based BYOC? cc @Laurawly @comaniac @zhiics